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I.  INTRODUCTION 

As  this  quarter  brings  the  Packet  Radio  Project  into  a  new 
year,  it  also  brings  the  development  of  new  potentials  in  the 
station  software  be.ng  designed  and  implemented  at  BPN.  Major 
progress  in  defining  protocols  to  be  used  in  the  Packet  i\adio 
network  provides  the  framework  for  actual  communication  among  Packet 
Radio  devices.  Additionally,  software  implementation  of  these 
protocols  has  reached  pregnant  levels  of  function.  As  detailed  in 
the  section  on  the  TCP  and  the  gateway,  considerable  functional 
operation  of  those  station  modules  has  been  demonstrated  during  this 
quarter.  The  nature  of  progress  this  quarter  can  roughly  be 
described  as  finally  having  large  enough  and  functional  enough 
modules  that  we  can  now  begin  to  assemble  them  into  software  that 
performs  like  a  station. 

At  t-he  same  time,  both  continuation  of  basic  support  and 
forward  looking  anticipation  of  design  issues  of  the  future  have 
been  pursued.  In  the  former  category,  maintenance  of  the  BCPL 
library  which  supports  the  higher  level  language  in  which  station 
functions  are  implemented  has  received  a  portion  of  our  efforts  this 
quarter.  Also,  enhancement  of  ELF,  the  operating  system  which 
provides  the  programming  environment  for  the  station  software,  has 
continued.  In  particular,  timing  primitives  were  installed  to 
facilitate  measurement  of  software  performance.  This  represents  a 
pleasant  new  direction  in  ELF  support  at  BBN.  Previously,  most  ELF 
development  and  support  effort  was  required  simply  to  obtain  a 
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functional  operating  system.  Now,  the  enhancement  of  ELF  serves  as 
an  occasional  means  for  bettering  our  software's  performance  and  our 
ability  to  improve  that  performance. 

In  addition,  this  quarter  includes  the  initiation  of  serious, 
full-time  effort  on  the  control  process.  This  vital  portion  of 
station  software  has  received  only  passing  acknowledgement  and  vague 
description  until  now.  A  new  member  of  BBN's  Packet  Radio  group  has 
now  assimilated  the  history  and  context  of  the  project  and  has 
become  an  active  and  important  member  of  the  group.  Resolution  of 
protocol  issues  has  allowed  substantial  progress  in  design  of  the 
control  functions  to  be  implemented  in  the  prototype  station,  as 
described  in  the  section  on  the  control  process. 
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II.  MEETINGS 

On  December  5  a  major  meeting  was  held  at  BBN  for  the  main 
purpose  of  discussing  protocol  issues.  The  Station  to  Packet  radio 
network  Protocol  (SPP)  had  been  under  discussion  for  several  months. 
Various  documents,  ranging  in  formality  from  PRTNs  through  network 
messages  to  informal  telephone  discussions  had  provided  a  rich 
groundwork  of  needs  and  design  concepts.  At  this  meeting  the 
various  needs  were  compared;  the  means  for  meeting  each  need  were 
compared  in  cost  and  effect  on  other  needs  and  capabilities.  Points 
of  difference  arising  from  the  differing  design  viewpoints  of  the 
different  contractors  were  aired.  As  a  result  of  this  meeting, 
agreement  was  reached  on  many  of  the  issues.  This  is  detailed  in 
the  section  on  the  control  process,  since  resolution  of  this  aspect 
of  Packet  Radio  network  operation  permitted  subsequent  progress  on 
the  control  process. 

The  December  5  meeting  also  addressed  station  design, 
documentation,  future  measurement  needs,  and  project  scheduling. 
During  this  quarter  several  telephone  conversations  with  Collins 
Radio  personnel  enhanced  the  utility  of  the  resolutions  of  that 
meeting.  Since  BBN  and  Collins  are  the  first  implementors  of  the 
SPP  protocol,  this  coordination  permitted  mutual  aid  and  design 
review.  We  were  also  involved  in  telephone  discussions  with  UCLA; 

in  this  case  the  issues  were  the  needs  for  various  measurements, 
both  in  general  and  specifically  those  which  the  control  process  may 
require  for  intelligent  supervision  of  the  network. 
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III.  PUBLICATIONS 

Three  Packet  Radio  Temporary  Notes  were  published  and 
distributed  this  quarter: 

PRTN  159  -  "A  Proposal  for  Incremental  Routing" 

PRTN  162  -  "Routing  in  the  Initial  Packet  Radio  Network" 

PRTN  165  -  "Will  the  Real  SPP  Please  Stand  Up?" 

The  first  of  these,  PRTN  159,  is  an  outgrowth  of  the  rich  protocol 
development  at  the  December  5  meeting.  In  large  measure,  PRTN  159 
simply  documents  and  solidifies  ideas  presented  by  BBN  rc  that 
meeting . 

As  discussed  in  the  section  on  the  control  process,  reaction  to 
and  review  of  PRTN  159  provided  an  insight  into  SPP  history  and 
evolution.  PRTN  162  was  issued  in  an  attempt  to  reach  a  new  vantage 
point  from  which  SPP  design  could  be  examined  more  globally.  From 
this  point,  several  alternatives  became  distinct;  after  presenting 
these,  PRTN  162  concludes  with  specific  recommendations  about  which 
alternatives  create  and  preserve  the  maximum  flexibility  for  the 
research  nature  of  the  prototype  Packet  Radio  network.  Because  we 
feel  an  informed  acceptance  of  some  design  strategy  is  essential, 
even  if  it  is  .lot  composed  of  the  alternatives  we  recommend,  we  have 
taken  several  steps  to  put  mild  pressure  on  our  fellow  contractors 
to  review  and  react  to  this  PRTN. 

PRTN  165  was  issued  in  the  hope  that  the  December  5  meeting  had 
resolved  SPP  protocol  issues  as  fully  as  the  other  members  of  the 
Packet  Radio  Working  Group  wished;  that  publishing  the  actual 
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specification  was  the  only  remaining  task.  The  response  to  PRTN  165 
proved  this  hope  to  be  naive.  We  found  that  a  number  of  design 
issues  were  misinterpreted  or  inappropriately  applied  to  the  network 
under  development.  We  found  that  extensive  cooperative 
negotiations,  with  SRI  in  particular,  were  necessary  and,  upon 
completion,  provided  fruitful  basic  material  for  another  round  of 
SPP  design.  While  not  issued  as  a  formal  publication,  the  text  flow 
between  the  east  and  west  coasts  on  this  issue  was  considerable,  and 
stands  as  a  further  contribution  to  the  Packet  Radio  literature. 
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IV.  STATION  GATEWAY 

At  the  beginning  of  this  quarter,  the  gateway  had  been  coded 
and  the  sections  dealing  with  the  ARPANET  had  been  debugged. 
However,  the  sections  dealing  with  the  PR  net  could  not  be  debugged 
until  the  connection  process  was  written. 

During  the  quarter,  coding  and  debugging  of  the  connection 
process,  which  implements  SPP  in  the  station,  was  carried  on 
concurrently  with  SPP  protocol  discussions.  The  SPP  protocol  design 
was  issued  as  PRTN  #165  and  after  discussions  with  SRI  and  Collins 
in  Dallas,  this  protocol  was  finalized  as  the  protocol  for  use  in 
the  initial  LADs. 

As  the  connection  process  was  altered  to  incorporate  changes  in 
SPP,  sections  if  the  gateway  were  also  rewritten  to  conform  to  the 
current  connection  process  implementation.  After  some  initial 
debugging  of  the  connection  process,  we  ran  the  TCP,  gateway  and 
connection  processes  in  order  to  debug  the  sections  of  the  gateway 
dealing  with  the  PRN.  By  the  end  of  the  quarter,  we  were  able  to 
demonstrate  use  of  the  gateway  and  connection  process  for  PRN  to  PRN 
communic  ations . 

At  this  time,  the  interface  between  the  connection  process  and 
the  various  "applications"  processes  --  debug,  measurement,  control 

and  the  gateway  --  was  defined.  Testing  of  the  gateway  and 
connection  processes  helped  to  clarify  this  interface,  and  the 
specification  is  now  detailed  enough  to  allow  initial 
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implementations  of  the  remaining  applications  processes. 

The  configuration  used  for  debugging  the  connection  and  gateway 
processes  at  this  stage  was  as  illustrated  below.  The  link  test 
support  program  was  run  in  the  PRDU.  The  connection  process, 
gateway,  TCP  and  TCP  test  program  were  run  in  the  station.  The  TCP 
test  program  opens  a  connection  to  the  PR  station  via  a  call  to  the 
TCP.  Packets  addressed  to  the  station  are  generated  by  the  test 
program  and  passed  to  the  TCP  which  passes  the  packets  to  the 
gateway.  On  receipt  of  a  packet  for  the  station,  the  gateway  calls 
the  connection  process  to  open  a  connection  to  the  station  and 
begins  sending  packets  over  this  connection.  The  connection  process 
sends  the  packets  out  through  the  IMP-11A  interface  to  the  PRDU 
where  the  link  test  support  program  loops  the  packets  back  to  the 
station.  On  receiving  a  packet  from  the  PRDU,  the  connection 
process  notes  from  the  PR  header  destination  field  that  it  is  for 
the  station  gateway  process  and  sends  it  to  the  gateway.  The 
gateway  notes  from  the  internet  destination  fields  that  the  packet 
is  for  the  "local"  Host  and  sends  it  to  the  TCP.  The  TCP  returns 
the  packets  to  the  test  program.  Upon  completion  of  all  data 
transfers,  the  gateway  notes  that  the  connection  is  no  longer  in  use 
and  signals  the  connection  process  which  closes  the  connection  by 
sending  a  FIN  packet.  When  this  FIN  packet  is  looped  back  to  the 
connection  process  by  the  PRDU,  the  connection  is  closed. 
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V.  CONTROL  PROCESS 

The  control  process  in  the  station  is  responsible  for  labeling 
(determining  how  packets  are  to  be  routed  through)  the  network. 
This  quarter  we  continued  our  study  of  the  protocols  governing  the 
processing  of  packets  by  PRs  (Packet  Radio  units)  as  they  relate  to 
labeling;  began  design  and  impl ementatio .  of  the  initial  version  of 
the  control  process;  and  designed  manual  data  entry  facilities  to 
permit  exercise  of  other  station  functions  in  the  absence  of 
automatic  labeling. 

A.  Protocols 


The  following  were  among  the  issues  relevant  to  labeling  that 
were  resolved  or  clarified  as  a  result  of  our  December  4  meeting 
with  Collins: 


1)  Terminal  PRs  will  not  forward  normal  traffic;  thus  the 

station  must  not  assign  routes  passing  through  them.  They 
will,  however,  relay  ROps  they  hear  to  the  station,  so  the 
station  will  have  complete  connectivity  information 
available. 

2)  The  label  to  be  assigned  to  a  PR  will  be  contained  in  the 

text  of  the  label  packet,  not  extracted  from  the  header. 
Thus  the  PR  will  not  get  the  wrong  route  if  the  label  packet 
is  rerouted  and  its  route  overwritten. 

3)  A  packet  will  be  defined  to  unlabel  a  PR.  This  will  be 

useful  to  the  station  for  eliminating  inconsistencies  by 
reinitializing  the  offending  PR. 

4)  The  text  of  ROPs  will  tell  whether  the  PR  is  labeled  and,  if 

so,  what  its  labeling  is. 

5)  PRs  will  never  spontaneously  unlabel  themselves.  They  only 

become  unlabeled  due  to  manual  reinitialization  or  receipt 
of  an  unlabel  command  from  the  station. 
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6)  A  special  protocol  for  handling  ROPs  allows  them  to  be 

forwarded  by  all  PRs  that  hear  them,  not  just  those  at  a 
particular  hierarchy  level.  Thus  the  station  can  assess  all 
connectivity  from  a  PR  in  a  fraction  of  the  time  previously 
required . 

7)  A  probe  packet  will  be  defined  which  the  station  can  use  to 

test  routes.  The  response  to  the  probe  will  tell  the 
station  what  route  the  packet  actually  followed. 

8)  All  hierarchy  levels  may  be  used  (formerly  one  was  reserved). 

This  is  a  result  of  a  new  active  hop  acknowledgement 
strategy  and  of  the  use  of  a  new  header  field  rather  thnn  a 
delimiting  route  label  to  indicate  the  number  of  hops  in  a 
packe t ’ s  route  . 

9)  ROPs  will  contain  a  few  performance  measures  made  by  the  PR  - 

in  particular  the  number  of  inbound  packets  queued, 
alternate-routed,  and  dropped.  The  intent  of  this  is  to 
alert  the  station  to  problems  with  the  first  hop  of  a  PR’s 
route.  However,  since  the  inbound  packets  may  not  all  be 
routed  along  this  hop  the  value  of  these  measures  is 
questionable . 

We  have  devoted  a  lot  of  time  to  the  issue  of  what  a  PR  knows 
about  routing,  how  it  knows  it,  and  how  it  uses  its  knowledge. 


At  the  December  meeting,  a  change  proposed  by  Collins  was 
agreed  to  wherein  PRs  would  not  make  assumptions  about  fixed  sizes 
and  locations  of  labels  in  a  route.  Instead,  the  field  assignments 
would  be  centrally  determined  at  the  station,  which  would  inform  PRs 
of  the  location  of  only  their  own  field.  PRs  would  assume  that 
fields  appeared  in  order,  so  they  could  replace  the  remaining  route 
of  an  inbound  packet  if  desired.  As  before,  the  station  would  give 
PRs  a  complete  route  to  the  station. 


We  proposed  a  further  change  such  that  the  station  would  tell 
PRs  only  a  single  inbound  hop,  not  a  complete  route,  and  also  the 
location  of  the  inbound  route  field.  PRs  would  always  insert  the 
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next  hop  on  inbound  packets.  This  scheme  would  make  the  measures 
described  in  (9)  above  refer  to  a  single  hop  and  would  minimize  the 
need  for  relabeling.  This  proposal  was  documented  in  rRTN  159,  "A 
Proposal  for  Incremental  Routing. " 


Critical  feedback  on  PRTN  159  made  us  think  more  deeply  about 
the  issues  of  PR  route  knowledge.  We  came  to  feel  that  the  design 
process  was  too  haphazard:  changes  were  being  made  to  accomplish 
individual  goals  without  understanding  their  effect  on  other  goals; 
changes  which  were  actually  independent  were  being  lumped  together 
as  single  proposals.  As  a  result,  capabilities  were  being  thrown 
away  unnecessarily.  We  addressed  these  issuer  in  PRTN  162,  "Routing 
in  the  Initial  Packet  Radio  Network."  This  PRTN  attempted  to 
separate  the  independent  decisions  which  were  made  in  tne  above 
proposals  and  show  how  each  decision  affected  the  capabilities  of 
the  PR  and  station.  It  ended  by  proposing  a  scheme  that  would 
retain  enough  flexibility  for  various  behaviors  to  be  tried.  In 
particular,  we  recommended  that  the  station  should  be  able  to  tell  a 
PR  any  amount  of  its  route,  ranging  from  a  single  hop  to  the  whole 
th„ng,  with  the  remainder  being  filled  in  as  necessary  en  route; 
that  tne  station  should  tell  the  PR  the  location  of  its  inbound 
route  field  so  the  PR  could  make  decisions  based  on  the  hop  an 
inbound  packet  was  taking;  and  that  the  station  should  tell  the  PR 
the  location  of  the  set  of  inbound  route  fields  so  the  PR  could 
modify  the  route  without  making  assumptions  about  field  order.  This 
would  allow  features  of  both  the  Collins  and  BBN  schemes  above  to  be 
included.  The  recommendations  of  PRTN  162  are  still  under 
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consideration . 

B.  Control  Process 

Although  some  protocol  issues  still  remain  to  be  decided, 
enough  was  determined  during  this  quarter  to  permit  detailed  design 
of  the  control  process  to  begin.  The  initial  version  will  use  only 
those  facilities  that  are  completely  understood,  making  simple 
decisions  based  on  easily  obtainable  information  and  taking  simple 
actions.  This  initial  system  will  be  described  in  a  PRTN  to  be 
issued  soon.  Implementation  has  already  begun. 

C.  Manual  Data  Entry 

PRs  can  be  given  labels  by  direct  operator  input  at  their 
console  terminals.  We  have  designed  and  will  shortly  implement 
routines  for  manually  informing  the  station  of  the  IDs  of  devices  in 
the  network,  the  (manually-entered)  labeling  of  PRs,  and  the 
correspondence  between  non-PR  devices  (.e.g.  terminals)  and  their 
attached  PRs.  This  will  enable  the  station  to  forward  packets  in  a 
test  network  before  a  control  process  that  performs  automatic 
labeling  is  available. 
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V.T  .  PDP-11  TCP  DEVELOPMENT 

The  adaptation  of  the  TENEX  TCP  for  operation  on  a  PDP-11  under 
ELF  was  completed  during  this  quarter.  Its  proper  operation  was 
demonstrated  by  logging  into  TENEX  through  a  user  TELNET  running  in 
a  PDP-11  under  the  ELF  operating  system  through  the  PDP-11  TCP  and 
TENEX  TCP  and  TELNET  server.  A  message  announcing  this 
accomplishment  was  sent  using  Mailsys  to  a  number  of  interested 
parties.  The  PDP-11  TCP  has  also  been  used  to  transmit  test  data  to 
itself  using  a  test  program  which  opens  both  ends  of  the  connection 
and  sends  and  receives  a  number  of  "letters”  of  data. 

Preliminary  measurements  of  the  operating  speed  of  the  PDP-11 
TCP  indicate  that  it  can  simultaneously  send  and  receive  5  packets 
per  second.  This  figure  was  obtained  using  very  short  packets  and 
measuring  the  amount  of  real  time  taken  to  transmit  a  given  number 
of  packets.  The  amount  of  idle  time  was  verified  to  be  virtually 
zero.  The  operating  speed  does  not  drop  appreciably  if  longer 
packets  are  used  indicating  that  the  limiting  factor  is  not  due  to 
the  transfer  of  data  from  buffer  to  buffer. 

The  initial  measurements  were  not  sufficiently  detailed  to 
indicate  the  reason  for  the  slow  performance,  so  steps  were  taken  to 
provide  more  elaborate  timing  measurement  facilities.  This  required 
changes  to  both  the  TCP  and  the  ELF  operating  system.  The  former  to 
identify  the  CPU  time  required  to  perform  various  tasks  within  the 
TCP,  and  the  latter  to  provide  the  facilities  to  obtain  the  CPU  time 


consumed . 
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A  new  ELF  primitive  was  added  to  provide  the  total  CPU  time 
consumed  by  a  particular  process  since  its  creation.  By  taking  the 
difference  between  the  result  of  executing  thia  primitive  (CPUIM) 
before  and  after  the  execution  of  a  particular  task,  the  CPU  time 
consumed  during  the  execution  of  that  task  was  obtained.  In  the 
process  of  debugging  the  new  primitive,  it  was  discovered  that  the 
ELF  time-of-day  clock  did  not  increase  monotonically .  Instead,  it 
would  occasionally  produce  a  value  which  was  less  than  it  should 
have  been  by  a  certain  amount.  The  next  reading  would  usually  be 
correct.  The  malfunction  was  traced  to  a  bug  in  the  manner  in  which 
the  hardware  clock  was  being  read  .  If  the  clock  counter  overflowed 
without  being  reset  prior  to  being  read,  then  the  apparent  elapsed 
time  since  the  last  clock  reset  would  be  small  by  the  at  ount  of  the 
clock's  setting.  There  would  be  no  long  term  error,  however,  since 
the  pending  interrupt  would  take  as  soon  as  the  interrupts  were 
re-enabled  and  the  cumulative  time  would  be  updated  properly.  The 
fix  involved  detecting  that  the  overflow  had  occurred  and  adjusting 
the  value  obtained  accordingly. 

The  debugging  of  the  new  timing  facilities  was  completed  as  the 
quarter  ended,  so  no  definitive  results  were  obl^ined,  but 
preliminary  indications  are  that  the  time  consumed  is  distributed 
fairly  uniformly  over  the  various  tasks.  Thus  the  prospects  are  not 
high  for  obtaining  a  dramatic  improvement.  Further  results  will  be 


reported  next  quarter. 
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VII.  CROSS-RADIO  DEtiUGGhiT 

Design  and  coding  of  the  cross-radio  debugger  was  begun  this 
quarter.  The  cross-radio  debugger  will  permit  transmission  of  alter 
memory  (AM)  and  display  memory  (DM)  executable  code  packets  to  any 
selected  accessible  PR  in  the  network,  and  provide  informative 
printout  as  a  function  of  the  response  to  these  packets.  The 
response  to  a  DM  packet  will  contain  the  data  in  the  specified 
memory  locations;  this  will  be  printed  on  the  station  operator's 
terminal.  In  the  event  that  nc  end  to  end  acknowledgement  is 
received,  the  cross-radio  debugger  will  so  inform  the  operator.  In 
this  and  other  respects  of  basic  design,  the  cross-radio  debugger  is 
patterned  after  the  debugging  package  which  Collins  Radio  has 
implemented  for  sending  AM  and  DM  commands  from  a  PR  local  console 
to  either  the  PR  or  a  remote  PR. 

The  coding  of  the  cross-network  debugger  will  be  completed  in 
the  next  quarter,  as  will  be  its  testing  and  inclusion  in  the 
growing  collection  of  station  software.  The  solidification  of  the 
interface  between  the  connection  process  and  a  user  process  (the 
cross-network  debugger  in  this  case)  late  this  quarter  will 
facilitate  the  completion  of  this  task. 


15 
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VIII.  SUPPORT  SOFTWARE 

A.  PDP-1 1  BCPL  Library 

The  library  for  support  of  BCPL  programs  running  under  ELF  was 
partially  rewritten  and  expanded.  The  rewrite  was  to  improve  the 
efficiency  of  terminal  10  and  to  permit  better  interlocking  of 
output  from  various  processes  using  the  same  device.  The  expansion 
resulted  from  providing  routines  that  call  ELF  primitives  directly 
rather  than  using  the  ELFCAL  function. 

The  number  printing  routines  were  modified  to  permit  better 
control  of  format.  This  involved  the  addition  of  width  and  format 
arguments  to  the  WriteOct,  WriteN,  and  WriteNumber  functions. 

B.  Other  ELF  Changes 

In  addition  to  the  ELF  changes  described  above,  changes  were 
also  made  to  improve  the  action  taken  when  a  program  running  ELF 
executed  an  illega1  instruction  or  otherwise  illegally  trapped.  The 
principle  problem  was  that  the  registers  reported  after  the  trap 
occurred  were  those  of  the  kernel  routine  that  fielded  the  trap 
rather  than  th^se  of  the  user  program  executing  the  instruction 
which  trapped.  A  secondary  problem  was  that  the  program  could  not 
be  restarted  in  any  way. 

This  was  remedied  by  making  the  routine  fielding  the  trap  take 
the  same  action  as  that  taken  when  an  EMT  is  executed.  This,  among 
other  things,  saves  the  contents  of  the  user  program's  registers  in 
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the  so-called  AC  block.  In  this  way,  they  are  accessible  to  the 
cross-net  debugger  just  as  if  tne  program  had  been  suspended  in  the 
midst  of  executing  an  ELF  primitive. 

This  change  has  subsequently  facilitated  the  diagnosis  and 
correction  of  a  number  of  obscure  bugs  in  the  TCP  and  other 
programs . 
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IX.  PACKhJT  RADIO  DIGITAL  UNIT 


During  this  quarter  further  debugging  of  the  Packet  Radio 


Digital  Unit  (PRDU)  hardware  problem,  noticed  previously,  was 
performed.  The  circumstances  and  nature  of  the  problem  were 


catalogued  extensively.  Briefly,  the  problem  involves  the  PRDU 


halting.  Once  halted,  there  is  very  little  which  can  be  determined 


about  the  state  of  the  PRDU,  which  hampered  debugging  efforts.  The 


halting  occurs  only  when  particular  software  in  the  PDP-11  is 


transmitting  packets  to  particular  software  in  the  PRDU.  The  clock 


rate  on  the  receive  DMA  in  the  PRDU  must  be  within  a  certain 


critical  range.  At  settings  of  delay  less  than  the  critical  range, 


a  second  problem  was  occasionally  noted.  This  second  problem 
involves  the  PRDU  hanging  (no  further  input  accepted)  on  the  second 


initiation  of  traffic  to  it  from  the  PDP-11.  The  final  recourse  was 


to  take  a  complete  memory  dump  of  the  affected  CAP  and  10  routine 


software  after  the  PRDU  had  halted,  and  forward  this  to  Collins 


Radio  for  diagnosis.  At  about  the  same  time  that  Collins  personnel 


decided  they  could  obtain  no  clues  from  the  memory  dump,  the 
hardware  was  moved  to  a  new  building  at  BBN.  After  the  move,  the 
halting  problem  did  not  seem  to  be  present,  although  the  hangup 
problem  still  occurred  occasionally.  The  decision  was  made  to 
postpone  further  work  on  the  problem  by  adjusting  tne  clock  delay  to 
a  large  time  interval,  at  which  neither  halting  nor  hangup  occur. 
With  this  resolution,  testing  and  provisional  acceptance  of  the 


second  PRDU  is  complete. 
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X.  IMP-1 1A  INTERFACE 

A  timing  bug  was  found  in  the  DEC  IMP11A  interface  hardware 
which  was  manifested  when  the  IMP11A  was  connected  to  the  Pluribus 
IMP  with  a  cable  of  the  appropriate  length  and  loss  characteristics, 
and  when  the  interface  was  operated  in  a  particular  manner.  The 
problem  wat.  traced  to  the  interface  occasionally  generating  a  short 
pulse  (0  to  60  nsec)  on  the  ready  for  next  bit  line  going  to  the  IMP 
whenever  the  word  count  was  exhausted  without  receiving  a  last  bit 
signal  from  the  IMP.  This  usually  occurred  when  running  the  network 
bootstrap  program  but  not  during  normal  operation.  It  furthermore 
required  the  slightly  higher  speed  logic  of  the  Pluribus  IMP  and  a 
cable  that  would  transport  the  pulse  to  the  IMP  at  the  proper  time. 
The  pulse  originated  in  a  hazard  between  two  signals  making  a 
transition  caused  by  the  the  same  source.  The  "or"  of  the  two 
signals  was  used  to  prevent  the  ready  for  next  bit  signal  coming  on. 
The  cure  was  to  generate  a  signal  equivalent  to  the  one  required  but 
without  any  holes  in  it. 

This  modification  has  been  given  to  DEC  for  inclusion  in 
subsequent  IMP11A  interfaces  and  for  distribution  to  other  users  of 
the  interface  . 
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I.  INTRODUCTION 

In  the  last  quarter,  we  developed  a  new  formulation  for  linear 
Prediction,  which  we  call  the  covariance  lattice  method.  The  method 
is  one  of  a  class  of  lattice  methods  which  guarantee  uhe  stability 
of  the  all-po’e  linear  prediction  filter,  with  or  without  windowing 
of  the  signal,  with  finite  wordlength  computations  and  with  the 
number  of  computations  being  comparaDle  to  the  traditional 
autocorrelation  and  covariance  methods.  We  incorporated  the 
covariance  lattice  method  into  our  floating-point  simulati..,  of  the 
LPC  speech  compression  system.  This  also  involved  "tuning"  of  such 
Quantities  as  analysis  interval  and  criterion  for  determining 
optimal  LPC  order,  to  obtain  approximately  the  same  speech  quality 
as  that  from  our  earlier  1500  bps  LPC  system  (which  uses  the 
autocorrelation  method)  at  about  the  same  total  computational  time. 
In  fixed-point  implementations,  however,  the  guaranteed  filter 
stability  provided  by  the  covariance  lattice  method  might  lead  to  an 
improvement  in  speech  quality  relative  to  that  from  the 
autocorrelation  LPC  system. 

We  oresen  ted  a  summary  of  major  results  of  our  speech 
compression  project  in  the  last  3  years  at  the  December  ARPA  Review 
Meeting.  This  summarv  was  also  issued  as  NSC  Note  77  and  is 
reproduced  in  this  report  as  Appendix  A. 

Also  in  the  last  quarter,  we  provided  specifications  for 
ARPA-LPC  speech  compression  system  II,  an  update  of  the  present 
system  I.  The  system  II  as  specified  by  us  will  be  implemented  at 
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the  different  ARPA-sponsored  sites. 

In  our  work  on  quality  evaluation  this  quarter,  we  have  run  a 
phoneme-specific  intelligibility  test  on  a  subset  of  five  of  the 
fourteen  LPC-vocoder  systems  we  studied  earlier.  The  analysis  of 
the  results  of  this  experiment  is  nearly  complete.  We  have  also 
analyzed  the  effects  of  lost  or  delayed  packets  on  speech 
intelligibility,  and  suggested  a  modified  way  of  packetizing  speech 
so  as  to  minimize  the  intelligibility  decrement.  The  suggestion, 
together  with  the  arguments  leading  up  to  it,  was  issued  as  NSC  Note 
#78,  and  is  reproduced  in  this  report  as  Appendix  D. 
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II.  COVARIANCE  LATTICE  METHOD  FOR  LINEAR  PREDICTION 

The  covariance  lattice  method  is  a  hybrid  between  the 
covariance  method  and  traditional  lattice  methods.  The  new  method 
has  all  the  advantages  of  a  regular  lattice,  plus  the  added 
advantage  of  a  computational  efficiency  comparable  to  the 
non-lattice  methods. 

As  mentioned  in  the  introduction,  the  covariance  lattice  method 
is  one  of  a  class  of  lattice  methods  with  many  desirable  properties. 
The  formulation  of  these  lattice  methods  and  their  efficient 
computational  procedure  are  described  in  NSC  Not  75,  a  copy  of 
which  is  attached  with  this  report  as  Appendix  B. 

A  program  with  spectral  and  waveform  display  capabilities  was 
written  for  use  from  our  IMLAC  PDS-1  display  terminal  to 
experimentally  study  the  covariance  lattice  method.  Using  this 
program,  we  verified  experimentally  the  results  analytically 
established  in  Appendix  B.  As  expected,  for  cases  where  the 
covariance  method  produced  an  unstable  linear  prediction  filter,  the 
covariance  lattice  method  produced  a  stable  filter.  In  addition, 
the  power  spectrum  of  the  stable  filter  was  found  to  be  a  reasonably 
good  fit  to  the  envelope  of  the  short-term  signal  spectrum.  A 
comparative  study  indicated  that  the  covariance  lattice  method 
resulted  in  estimates  of  pole  bandwidths  generally  larger  than  those 
obtained  from  the  covariance  method  and  generally  smaller  than  those 
given  by  the  autocorrelation  method. 
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Another  study  that  we  conducted  using  the  interactive  display 
program  was  concerned  with  the  length  of  the  analysis  interval  for 
the  covariance  lattice  method.  Longer  intervals  mean  more 
computations  required  in  solving  for  the  predictor  parameters.  With 
analysis  intervals  shoi ter  than  a  pitch  period,  the  accuracy  of  the 
power  spectrum  of  the  resulting  linear  predictor  (relative  to  the 
envelope  of  the  short-term  speech  spectrum)  was  found  to  critically 
depend  on  the  location  of  the  analysis  interval  relative  to  the 
Ditch  pulses.  Notice  that  an  analysis  scheme  that  requires 
positioning  of  the  analysis  interval  with  respect  to  the  location  of 
pitch  pulses  is  basically  a  pitch-synchronous  scheme.  Since  we  have 
not  yet  resolved  all  the  issues  relating  to  such  frame  positioning 
and  since  we  wish  to  keep  the  analysis  simple  for  vocoder 
application,  we  chose  to  employ  a  sufficiently  long  analysis 
interval . 

Our  next  step  was  to  incorporate  the  covariance  lattice  method 
into  our  floating-point  simulation  of  the  LPC  vocoder.  The 
introduction  of  the  new  analysis  scheme  necessitated  the  "tuning"  or 
adjustment  of  a  number  of  other  parameters.  They  were:  1)  length  of 
the  analysis  interval,  2)  criterion  to  determine  optimal  predictor 
order,  3)  log  likelihood  ratio  threshold  used  in  variable  frame  rate 
transmission,  and  4)  bit  allocation  for  log  area  ratios.  The  goal 
was  to  obtain  approximately  the  same  speed’  quality  as  that  from  our 
earlier  1500  bps  LPC  system  at  about  the  same  total  computational 
time  and,  of  course,  at  the  same  average  bit  rate. 
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Table  2  lists  the  average  bit  rates  lor  5  different  systems.  System 
5  was  found  to  produce  good  quality  speech,  approximately  the  same 
as  our  earlier  1500  bps  system,  at  about  the  same  total 
computational  time. 

In  fixed-point  implementations,  finite  wordlength  computations 
can  cause  filter  instabilities  with  the  autocorrelation  method.  The 
covariance  lattice  method  still  guarantees  filter  stability  as 
stated  earlier.  Therefore,  in  fixed-point  implementations,  the 
covariance  lattice  method  might  yield  better  quality  speech  than  the 
autocorrelation  method.  Furthermore,  as  stated  in  Appendix  B,  the 
covariance  lattice  method  permits  the  quantization  of  the  reflection 
coefficients  to  be  accomplished  within  the  recursion  for  retention 
of  accuracy  in  representation.  Such  a  quantization  method  migho 
also  lead  to  an  improvement  in  the  quality  of  the  synthesized 
speech  . 
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III.  SPECIFICATIONS  FOR  ARPA-LPC  SYSTEM  II 

The  approach  we  employed  in  arriving  at  the  specifications  was 
to  reap  maximum  benefit  for  the  least  amount  of  effort  in  terms  of 
cnanges  to  the  present  System  I.  Our  overall  design  objective  was 
to  achieve  average  continuous-speech  transmission  rates  of  about 
2200  bps.  With  the  use  of  a  silence  detection  algorithm,  the?; 
rates  may  drop  to  about  1000  bps  or  less. 

There  are  two  major  differences  between  System  I  and  II.  These 
are:  1)  Variable  frame  rate  transmission  of  LPC  parameters,  and 
2)  use  of  new  coding/decoding  tables  for  transmission  parameters. 
The  details  of  System  II  specifications  are  contained  in  NSC  Note  82 
which  is  included  in  this  report  as  Appendix  C. 
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V.  PHONEME-SPECIFIC  INTELLIGIBILITY  TESTS 


A.  Purpose 


If  two  communications  systems  differ  noticeably  in 
intelligibility,  the  question  of  their  relative  quality  rarely 
arises.  As  a  result,  quality  comparisons  are  usually  performed  only 
on  sets  of  systems  that  have  equal  (and  usually  high) 
intelligibility.  It  has  often  been  argued  that  the  information 
obtained  from  quality  tests  could  better  be  obtained  from 
intelligibility  tests,  if  tne  latter  could  only  be  made  sufficiently 
difficult  that  the  scores  dropped  substantially  below  100?.  As  an 
extreme  example,  consider  a  pair  of  systems  that  both  score  98?  on 
Intelligibility  Test  -X*.  Test  'X'  is  based  on  measuring  the 
intelligibility  of  a  two-word  vocabulary,  consisting  of  the  digits 


'one',  and  'two'.  It  is  obvious  that  there  might  be  considerable 


differences  in  the  quality  of  the  speech  passed  by  the  two  systems 
that  test  'X'  would  fail  to  detect.  On  the  other  hand,  a  more 
difficult  test,  based  perhaps  on  PB  word  lists,  might  well  separate 


the  two  systems. 


The  question  of  whether  quality  tests  and  intelligibility  tests 


are  measuring  the  effects  of  the  same  variables  is  a  very  important 


one.  Quality  tests  are  much  more  subjective  than  intelligibilit' 


tests,  since  they  -equire  the  subject  to  make  a  judgment,  such  as  a 


rating  or  a  preference,  for  which  there  is  no  objectively  correc 


response.  Consequently,  the  results  of  quality  tests  are  heavilj 


dependent  on  the  set  of  s;  stems  being  compared,  on  the  test  subject; 
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and  the  instructions  they  are  given,  and  on  a  variety  of  other 
variables  chat  are  hard  to  control  and  hard  to  quantify.  Nakatani 
and  Dukes  (1971)  have  had  some  success  in  showing  the  equivalence 
between  quality  measures  and  their  *Q-Measure'  of  intelligibility, 
but  unfortunately  their  procedure  is  complicated  and  expensive  to 
run.  Furthermore,  the  quality  data  against  which  Nakatani  and  Dukes 
compared  their  Q-Measure  results  were  much  less  rich  in  detail  than 
the  quality  data  available  to  us,  as  a  result  of  the  quality  tests 
we  have  reported  in  earlier  QPR's.  Since  the  results  of  our  tests 
were  successful  in  providing  diagnostic  information  about  how  the 
vocoders  differed  in  quality,  it  was  considered  important  to  use  an 
intelligibility  test  that  was  capable  of  yielding  similar  diagnostic 
detail.  This  permits  a  much  more  detailed  comparison  of  the  two 
methods  than  if  a  simple  percent-correct  test  were  used.  For 
example,  it  makes  possible  the  use  of  the  same  multi-dimensional 
scaling  procedures  for  analyzing  both  sets  of  data.  The  results  of 
the  analyses  can  then  be  compared,  to  see  if  the  results  are  well 
described  by  a  single  psychological  structure.  This  is  a  procedure 
we  have  already  had  some  success  with,  as  described  in  BBN  Report 
No.  3209,  where  we  showed  that  the  rank-ordering  task  and  the 
rating  task,  produce  highly  similar  results  in  quality  evaluation. 


B.  The  Phoneme-Specific  Intelligibility  Test 

The  phoneme-specific  intelligibility  test  we  adopted  is  a 
development  of  one  described  by  Stevens  (1962).  The  test  has  two 
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parts,  one  for  consonants  and  one  for  vowels.  It  is  a 
nonsense-syllable  test,  using  closed  response  sets  of  4—3  items. 
Both  of  these  factors  increase  the  difficulty  of  the  test  over  that 
of  the  Diagnostic  Rhyme  Test  ( DRT:  Voiers  et  al,  1973),  which  is  the 
only  other  test  available  with  similar  diagnostic  power.  The  DRT 
measures  only  consonants,  and  only  in  initial  position,  and  the 
response  set  for  each  item  is  a  minimal  pair  of  English 
monosyllables.  The  Phoneme-Specific  Intelligibility  test  covers 
vowels  and  consonants  in  both  pre-stress  and  in  final  position.  The 
stimulus  items  are  nonsense  syllables  of  the  form  /a'C1VC2/,  where 
/  9/  is  an  unstressed  schwa  like  the  first  syllable  of  'about',  Cl 
and  C2  are  consonants,  and  V  is  a  stressed  vowel.  The  comolete  test 
consists  of  1*1  separate  subtests.  The  first  ten  are  consonant 
tests,  each  of  which  uses  a  single  closed  set  of  consonants  from 

which  Cl  and  C2  are  drawn.  The.”e  are  four  versions  of  each 

consonant  subtest,  two  of  which  use  one  pair  of  vowels  as  syllable 
nuclei,  and  two  using  a  second  pair  of  vowels.  A  typical  consonant 
test  list  is  shown  in  Figure  1.  Each  consonant  in  the  closed 
response  set  appears  four  times  in  each  list,  once  preceding  and 

once  following  each  of  the  two  context  vowels.  In  addition,  there 

are  three  filler  items  (ringed  numbers  in  Figure  1)  added  to  prevent 
subjects  from  using  the  symmetry  of  the  test  to  aid  their 
responding.  The  vowel  tests  are  similar,  except  that  each  vowel 

appears  four  times  in  each  list,  in  symmetrical  consonant  context, 
and  there  are  three  different  sets  of  consonant  contexts  for  each 
vowel  subtest.  The  complete  test  is  summarized  in  Table  1,  which 
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TEST  NO. 


NAME. 


DATE-. 


CONSONANTS1  b  c*-  ^  b 
vowels:  a.  1 


i .  CL  b 

(IXh- 1  i. 

3  -%-I  ±_ 


4.  _i_a.c 


5  _|0  I_b_ 

6  ,  k  i  A. 


7  -Ai4_ 

8. 

10  _t_I  _p_ 

12.  Jclol±_ 


14.  JC.  Ol^S_ 
(T5)_lp-  a.  _k_ 


Figure  1:  A  sample  consonant  test  list.  Each  nonsense 
syllable  is  preceded  by  an  unstressed  vowel, 
and  contains  an  ini”'al  and  final  consonant 
drawn  from  the  consonant  response  set,  and  a 
vowel  from  the  context  vowel  set.  The  ringed 
items  are  fillers. 
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gives  the  response  set  and  context  sets  for  each  of  the  ten 

consonant  subtests,  and  for  each  of  the  four  vowel  subtests. 

C.  Talkers  and  Recordings 

Two  talkers  each  recorded  one  cf  the  symmetrical  halves  of  the 

complete  test.  All  lists  with  an  'M*  in  the  title  (See  Table  3) 

were  read  by  the  male  talker,  who  had  a  low  fundamental.  (He  was 

speaker  #3,  DK,  in  the  quality  tests).  The  lists  with  an  » F  *  in  the 
title  were  read  by  a  female  talker.  Both  had  onsiderable 
experience  with  phonetic  symbols,  and  with  recording  techniques, 
i'le  lists  were  read  in  a  sound- treated  room,  and  were  recorded  with 
a  boom-mounted  electret  microphone  (Thermo  Electron,  Model  5336), 
and  high-quality  recording  equipment.  The  items  in  a  list  were  read 
at  a  constant  vocal  effort,  and  at  a  rate  of  one  item  every  5.5 
seconds,  cued  by  a  flash  of  light  from  an  electronic  interval  timer. 
Errors  and  slurred  productions  were  removed  by  repeating  the  whole 
list.  It  took  approximately  three  hours  to  record  each  talker. 

D.  Selection  of  Lists  and  Systems  for  Bilot  Experiment 

Although  all  the  64  lists  in  the  complete  test  were  recorded, 
the  amount  of  material  involved  precludes  using  the  complete  test, 

except  for  testing  real-time  systems.  To  keep  the  experiment  within 
reasonable  proportions,  we  selected  seven  consonant  lists  from  the 
total  of  64,  and  five  of  the  computer-simulated  vocoder  systems  from 
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the  14  used  in  our  earlier  quality  tests.  Six  of  the  selected  lists 
were  from  the  set  spoken  by  the  male  speaker,  and  one  was  spoken  by 
the  female  speaker.  The  reasons  for  choosing  only  consonant  lists 


were : 


1.  The  consonant  lists  are  intrinsically  harder  than  the  vowel 
lists,  partly  because  most  of  them  require  two  responses  per 
i  tem . 

2.  The  vowel  tests  require  of  the  subjects  a  greater 

familiarity  with  phonetic  symbols  for  writing  down  their 
responses,  and  we  wished  to  avoid  lengthy  training  sessions. 

The  lists  we  selected  are  underlined  in  Table  3.  They  consist  of 
lists  IBM,  2AM,  3BM,  4BM,  7AM,  and  10AM  spoken  by  the  male  talker, 
and  list  7BF  spoken  by  the  female. 

In  addition  to  the  9-bit  PCM,  unvocoded  version  of  each  test 
list,  the  seven  lists  were  processed  through  four  vocoder  systems. 
These  selected  systems  were  systems  A,  D,  F  and  G  in  BBN  Report  No. 
3209,  which  were  all  fixed-rate  systems,  so  that  their  bit  rates  did 
not  vary  with  the  speech  material. 

The  vocoders  include  one  of  the  best,  one  of  the  worst,  and  two 
other  systems  whose  relative  quality  depended  heavily  on  the  speech 
materials . 


E.  Procedure 

In  our  first  pilot  experiment,  we  presented  the  35  processed 
lists  (7  lists  x  5  systems)  in  an  irregular  order  to  a  group  of 
listeners.  It  soon  became  obvious,  however,  that  error  rates  were 
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low,  and  that  subjects  became  aware  that  the  same  lists  were  being 
repeated  several  times.  For  these  two  reasons  we  redesigned  the 
pilot  experiment  to  correct  these  deficiencies. 

First,  by  cutting  and  splicing  the  stimulus  tapes,  we  arranged 
that  in  each  of  the  five  presentations  of  a  list  ,  one  through  each 
system,  the  list  appeared  in  a  different  cyclic  permutation. 
Secondly,  subjects  were  run  in  groups  of  four,  and  although  each 
group  of  subjects  heard  all  35  processed  lists,  :n  the  same  cyclic 
order,  each  group  started  in  a  different  place  in  the  cyclic  order. 
Thus,  each  of  the  five  versions  of  a  given  list  was  heard  in  the 
first  block  of  seven  lists  by  one  group  of  subjects,  in  the  second 
block  of  seven  by  a  second  group  of  subjects,  and  so  on.  This 
effectively  cour terbalanced  the  presentation  order,  and  controlled 
for  learning  effects. 

Thirdly,  a  revised  response  sheet  was  composed  for  each  test 
list,  as  shown  in  Figure  2,  and  a  secondary  task  was  introduced,  so 
that  correct  items  as  well  as  errors  would  yield  data  on  the 
relative  intelligibility  of  the  systems.  The  secondary  task  was  to 
write  down,  after  each  item,  the  number  appearing  on  a  digital 
counter  in  front  of  the  subjects.  The  clock  count  incremented  every 
100  msec,  and  the  count  was  reset  to  zero  by  the  experimenter  at  the 
instant  of  presentation  of  each  stimulus  item.  Thus  the  subjects 

were,  in  effect,  recording  a  rather  gross  measure  of  the  time  they 
had  taken  to  make  each  response. 
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F.  Subjects 

The  twenty  subjects  were  students  at  a  local  High  School  that 
responded  to  an  advertisement.  They  served  in  groups  of  four,  and 
were  paid  for  their  services.  The  experiment  was  run  in  a  quiet 
room,  and  the  stimulus  tapes  were  played  through  a  high  quality  loud 
speaker.  The  instructions  that  were  read  to  the  subjects  are 
presented  in  Appendix  E.  Several  practice  items  were  given,  and 
care  was  taken  to  make  sure  the  subjects  understood  the  task.  The 
whole  experiment  took  about  2  hours,  including  several  rests. 


G.  Results:  Overall  Error  Rates 

V/e  present  below  a  summary  of  the  distribution  of  errors,  as  a 
function  of  the  test  list,  and  the  vocoder  system  it  was  processed 
through.  We  also  present  confusion  matrices,  for  each  list  and 
system,  although  we  will  postpone  detailed  discussion  of  these  until 
a  later  report.  Our  analyses  of  the  response- time  data  from  the 
secondary  task  are  not  yet  complete,  nor  have  we  made  comparisons 
between  the  results  of  the  present  intelligibility  tests  and  the 
earlier  quality  tests. 

The  most  gross  summary  of  errors  is  presented  j n  Table  4,  which 
shows  the  total  number  of  errors  made  by  th®  20  subjects, 

categorized  by  the  test  list  and  by  the  vocoder  system  the  list  was 
processed  through.  The  error  totals  are  further  broken  down  by 


whether  the  error  occurred  on  an  initial  or  a  final  consonant. 
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The  total  error  rate  across  all  systems  and  all  lists  was  9.145? 
(a  total  of  1463  errors  out  of  a  possible  16,000).  The  total  error 
rate  across  all  lists  varied  from  4.75?  for  the  PCM  unvocoded  speech 
to  12.6%  for  system  F  (10-poles,  25  msec  frame  size,  0.2  dB 
quantization  step  size).  The  other  three  systems  all  generated 
error  rates  close  to  9-5%.  Pooled  across  all  systems,  the  error 
rates  on  the  different  lists  varied  from  3-75?  on  list  10AM  (initial 
stop  clusters)  to  15.7%  on  list  4BM  (voiced  and  voiceless 
fricatives).  This  range  of  total  error  rates  was  considerably 
smaller  than  we  had  hoped:  it  appears  that  this  test  is  not 
sufficiently  difficult  to  separate  the  systems  very  widely.  An 
alternative  method  to  increase  the  difficulty  of  the  tests  is  to 
record  the  test  materials  under*  degraded  conditions.  The  major 
problem  with  this  approach  is  reproduceability ,  since  simply  adding 
noise  is  not  very  realistic.  If  is  also  important  not  to  lose  sight 
of  the  conditions  under  which  the  vocoding  system  will  actually  be 
used.  If  the  problem  is  to  select  one  of  a  pair  of  vocoder  systems, 
for  use  in  quiet  offices,  the  results  of  comparing  them  in  100  dB 
aircraft  njise  is  not  likely  to  be  very  relevant  —  yet  it  may  be 
necessary  to  degrade  recording  conditions  this  much  to  get  a 
significant  difference  between  the  systems. 

The  overall  e.ror  scores  in  Table  4  are  not  very  informative. 
For  initial  consonants  and  for  final  consonants,  and  for  both 
combined,  System  N  (FCM  Speech)  produced  the  fewest  errors,  and 
System  F  produced  the  most.  We  have  not  yet  completed  a  careful 
comparison  of  the  present  results  with  those  of  the  earlier  quality 
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tests,  but  in  those  tests,  System  G  was  found  to  have  consistently 
worse  quality  than  System  F.  Thus,  at  first  sight  it  appears  that 
the  quality  results  may  be  different  from  the  intelligibility 
results.  It  is  interesting  to  note  that,  in  the  c ne  list  recorded 
with  a  female  vnipp,  List  7BF,  System  G  yielded  the  fewest  errors  — 
fewer  even  than  System  N,  the  PCM  original.  This  result  does  not 
seem  very  likely  —  it  may  be  due  to  lack  of  balance  between  the 
five  groups  of  experimental  subjects. 

Table  5  presents  the  same  error  data  as  Table  4,  this  time 
further  broken  down  by  each  phoneme  in  the  response  set.  Each  cell 
represents  the  number  of  error;  made  by  twenty  subjects,  to  two 
presentations  of  the  specified  phoneme  (three  presentations  for 
final  m,  ng ,  in  List  7AM;  and  final  m,  r,  in  List  7BF).  Thus  cell 
totals  are  40  (60  for  the  foregoing  exceptions)  . 

Inspection  of  Table  5  shows  that  a  few  phonemes  accounted  for  a 
large  number  of  errors.  For  example,  in  List  2AM,  /k/  in  initial 
position  yielded  20-22  errors  for  each  of  the  vocoder  systems  except 
N  (PCM  speech) .  Inspection  of  the  individual  subjects  response 
sheets  shows  that  subjects  were  in  strong  agreement  on  their  errors: 
of  the  total  of  84  errors,  68  of  the  initial  k’s  were  heard  as  p's, 
and  14  were  heard  as  f’s.  It  is  possible  that  this  high  degree  of 
agreement  was  due  to  a  response  bias,  induced  perhaps  by  earlier 
items  in  the  list.  Other  examples  that  may  have  a  similar 
explanation  occurred  in  List  3AM  for  initial  /g/  (55  out  of  56  g's 
were  heard  as  v’s);  in  List  43M  for  initial  /zh/  (here  the  errors 
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may  be  due  to  subjects  lack  of  familiarity  with  the  discrimination 
required  --  they  are  distributed  over  all  systems,  including  N,  the 
PCM  speech)  and  for  final  /s/  (59  out  of  87  errors  heard  as  z);  and 
for  final  /m/  in  Lists  7AM  and  7BF  (33  out  of  105,  and  56  out  of  61 
being  heard  as  ng ,  respectively).  The  overall  error  rates  would  be 
considerably  lower  if  these  errors  were  ignored.  However,  it  should 
be  noted  that  few  of  these  errors  occurred  with  system  N  (PCM 
speech)  --  in  other  words,  they  only  occurred  when  the  speech  was 
somewhat  degraded  by  the  vocoder  system. 

Tables  6-12  give  an  even  more  detailed  break-down  of  the 
errors  for  each  list  in  the  confusion  matrices.  We  will  postpone 
detailed  discussion  of  these  until  we  have  made  the  comparisons  with 
the  results  of  the  quality  tests.  The  analysis  of  the  reaction  time 
data  will  also  be  available  by  then. 
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ptk 

f  s ,  sh 

ptk 

bdg 

init : 

lmnrwy 
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set 

bdg 

vz  ,zh 

fs  ,sh 

vz ,  zh 

fin : 

lrmn ,ng 
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10AM 
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34 

10 

1 1 
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A 
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41 

31 

22 
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7 
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7 
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15 

33 
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45 

37 
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3 

5 
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20 

13 

7 

3 

17 

6Q 
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21 

48 

22 

15 

24 

26 
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1 

D 
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28 

18 

28 

37 

25 
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F 

20 

50 

18 

34 

44 

28 
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1 

G 

27 

33 
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N 

16 

54 

23 

18 

7 

28 

5 
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A 

42 

89 

53 

37 

32 

33 

2 
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D 

31 

63 

57 

69 

47 

31 

14 
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F 

31 

93 

55 

91 

58 

43 

3? 

404 

1 

G 

49 

78 

63 

48 

44 

21 

5 
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Total : 
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377 

251 

263 

188 

156 

59 
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%: 

7.04 

15.71 

10.46 

10.96 

7.80 

6.50 

3.69 

9.14 

24 
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BBN  SPEECH  COMPRESSION  PROJECT 


SUMMARY  OF  MAJOR  RESULTS 
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NSC  Note  77,  December  15,  1975 


(Author:  R.  Viswanathan) 
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BBN  SPEECH  COMPRESSION  PROJECT 
SUMMARY  OF  MAJOR  RESULTS 


The  overall  goal  of  our  research  has  been  to  develop  a 
Linear  Predictive  Speech  Compression  (LPC)  system  that 
transmits  high  quality  speech  at  the  lowest  possible  data 
rates.  We  have  developed  several  methods  for  reducing  the 
redundancy  !•>.  ‘‘he  speech  signal  without  sacrificing  speech 
quality.  Below  is  a  summary  of  the  major  results  and 
conclusions  of  our  work  in  the  last  three  years. 

1 .  Preemphasis 

Preemphasis  of  speech  reduces  its  spectral  dynamic 
range,  which  in  turn  (1)  diminishes  the  magnitude  of 
problems  due  to  finite  wordlength  computation,  and  (2) 
improves  parameter  quantization  accuracy.  We  recommend 
first-order  preemphasis  (fixed  or  adaptive);  second-order 
preemphasis  leads  to  perceivable  distortions  in  synthesized 
speech  [1,2], 

2 .  Variable  Order  Linear  Prediction 

We  transmit  for  every  frame  the  minimum  number  of 
predictor  parameters  which  adequately  represent  the  speech 
spectrum  in  that  frame.  Our  method  uses  an  information 
theoretic  criterion  to  determine  the  "optimal"  order,  and 
produces  average  savings  of  10$  in  the  transmission  rate 
[2,3J. 

3 •  Choice  of  Parameters  for  Quantization  and  Transmission 

(a)  Pitch :  We  found  that  quantizing  the  logarithm  of 
pitch  values  was  adequate.  However,  a  difficulty  arises  in 
attempting  to  quantize  the  log  pitch  in  that  at  the  high 
frequency  end  (small  pitch  period)  of  the  range  of  interest, 
the  quantization  bin  size,  as  found  py  dividing  the  log 
pitch  scale  into  equal  segments,  can  be  so  small  as  to 
result  in  cases  where  two  distinct  quantization  bins  yield 
the  same  decoded  value,  thus  wasting  some  quantization 
levels.  We  proposed  a  method  lor  deriving  the  pitch  coding 
and  decoding  tables  in  such  a  way  that  maximum  usage  is  made 
of  the  different  quantization  levels  [4], 

(b)  Gain:  Our  findings  based  on  statistical  error 
analysis  indicated  that,  in  general,  it  is  better  to  use 
speech  signal  energy  for  transmission  than  to  use  prediction 
error  signal  energy  [5]. 

(c)  Filter  Parameters :  From  a  comparative  study  of  a 
number  of  equivalent  sets  of  predictor  parameters,  we 
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^ .  Variable  F rame  Rate  Transmission 


LPC  parameters  are  transmitted  at  variable  intervals  in 
accordance  with  the  changing  characteristics  of  the  incoming 
speech.  The  decision  to  transmit  is  based  on  a  threshold  on 
the  log  likelihood  ratio  of  prediction  residuals.  We  found 
that,  for  a  given  average  bit  rate,  variable  frame  rate 
transmission  produces  superior  quality  speech  than  fixed 
frame  rate  transmission  [2,8,9]. 

5 .  Encoding 


We  use  a  variable  length  code  (Huffman  code)  to  encode 
the  quantized  transmission  parameters  at  significantly  lower 
bit  rates  (savings  on  the  order  of  15%),  and  with  absolutely 
no  effect  on  speech  quality  [10], 

6 .  Synthesis 

[a)  Time-Synchronous  Synthesis :  We  found  that 
time-synchronous  updating  (e.g.,  every  5  or  10  msec)  of  the 
filter  parameters  at  the  synthesizer  yields  better  speech 
quality  than  pitch-synchronous  updating  if  the  analysis  is 
performed  t ime-synchronously  [2],  Time-synchronous 
parameter  updating  has  the  additional  advantage  of 
simplifying  the  necessary  computations. 

(b)  Gain  Implement at  ion :  We  recommend  implementing  the 
speech  signal  energy  as  a  gain  multiplier  at  the  input  of 
the  synthesizer  filter.  With  the  gain  multiplier  placed  at 
the  output  of  the  filter,  perceivable  distortions  are 
produced  in  synthesized  speech  at  places  where  relatively 
large  frame-to-frame  energy  changes  occur  [8],  (There  are, 
however,  adhoc  solutions  to  this  problem. ) 

v c )  Opt ima 1  Linear  Interpolat ion  :  For  improved 
interpolation  of  synthesizer  parameters,  we  proposed  a 
scheme  that  requires  the  transmission  of  an  extra  parameter 
pc*  data  frame  [11],  This  optimal  linear  interpolation 
scheme  improves  speech  quality  during  rapid  transitions  in 
the  speech  signal,  at  the  expense  of  increasing  the  bit  rate 
by  50-150  bps. 
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7 .  Simulation  of  LPC  Systems 

Using  floating-point  arithmetic  we  simulated  the  entire 
speech  compression  system  with  its  many  different  variations 
in  our  TENEX  time-sharing  computer  facility  [2],  Using  this 
simulation  system,  we  demonstrated  the  results  of  three  low 
bit-rate  LPC  systems  at  ARPA  NSC  meetings.  The  first  system 
produced  good  quality  speech  at  average  rates  of  1500 
bps[2,12].  Speech  quality  degraded  noticeably  for  the 
second  system  with  an  average  transmission  rate  of  1000  bps, 
although  the  intelligibility  of  the  transmitted  speech  was 
still  good  [8].  The  third  system,  which  used  differential 
pulse  code  modulation  (DPCM)  for  quantizing  the  transmission 
parameters,  yielded  good  speech  quality  at  essentially  fixed 
rates  of  2000  bps[8].  No  explicit  silence  detection  was 
employed  in  these  three  systems. 

8 .  Steps  Towards  Real-Time  Implementation 

We  worked  in  cooperation  with  the  other  sites  in  the 
ARPA  community  towards  implementation  of  an  LPC  vocoder  that 
transmits  speech  in  real  time  over  the  ARPA  Network. 


1  U.’.>  H'L'1.  T'  UF  V'  V  j.'  u  ■  u  ■  u  <y  ■  I.J*  JM' 
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NEW  LATTICE  METHODS  FOP  LINEAR  PREDICTION 


This  paper  presents  a  new  formulation  for  linear 
prediction,  which  we  call  the  covariance  lattice  method. 
The  method  is  viewed  as  one  of  a  class  of  lattice  methods 
which  guarantee  the  stability  of  the  all-pole  filter,  with 
or  without  windowing  of  the  signal,  with  finite  wordlength 
computations,  and  with  the  number  of  computations  being 
comparable  to  the  traditional  autocorrelation  and  covariance 
methods.  In  addition,  quantization  of  the  reflection 
coefficients  can  be  accomplished  within  the  recursion  for 
retention  of  accuracy  in  representation. 

1 .  Introduction 

The  autocorrelation  method  of  linear  prediction  [1] 
guarantees  the  stability  of  the  all-pole  filter,  but  has  the 
disadvantage  that  windowing  of  the  signal  causes  some 
unwanted  distortion  in  the  spectrum.  In  practice,  even  the 
stability  is  not  always  guaranteed  with  finite  wordlength 
(FWL)  computations  [2].  On  the  other  hand,  the  covariance 
method  does  not  guarantee  the  stability  of  the  filter,  even 
with  floating  point  computation,  but  has  the  advantage  that 
there  is  no  windowing  of  the  signal.  One  solution  to  these 
problems  was  given  by  Itakura  [3]  in  his  lattice 
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formulation.  In  this  method,  filter  stability  is 
guarantee^ ,  with  no  windowing,  and  with  FWL  computations. 
Unfortunately,  this  is  accomplished  with  about  a  four-fold 
increase  in  computation  over  the  other  two  methods. 

This  paper  presents  a  class  of  lattice  methods  which 
have  all  the  properties  of  a  regular  lattice  but  where  the 
number  of  computations  is  comparable  to  the  autocorrelation 
and  covariance  methods.  In  these  methods  the  "forward"  and 
"backward"  residuals  are  not  computed.  The  reflection 
coefficients  are  computed  directly  from  the  covariance  of 
the  input  signal. 


is  known  as  the  inverse  filter,  G  is  a  gain  factor,  ak  are 
the  predictor  coefficients,  and  p  is  the  number  of  poles  or 
predictor  coefficients  in  the  model.  If  H(z)  is  stable, 
A(z)  can  be  implemented  as  a  lattice  filter,  as  shown  in 
Fif-  1.  The  reflection  (or  partial  correlation) 
coefficients  Ki  in  the  lattice  are  uniquely  related  to  the 
predictor  coefficients.  Given  Ki ,  1<i<p,  the  set  (ak)  is 
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computed  by  the  recursive  relation: 
(i) 


a  . 
1 


=  K- 


a.;11  =  a*1'11  +  K.  a.  R'D 
J  D  i  1-3 


(3) 


1  <  j  5  i-1 . 


where  the  equations  in  (3)  are  computed  recursively  for 

(p) 

i=1,2,...,p.  The  final  solution  is  Riven  by  aj=a-  ,  1<J<.p. 


For  a  stable  H(z),  one  must  have 
I Ki I  <  1/  l5i<p  . 


(4) 


In  the  lattice  formulation,  the  reflection  coefficients 

can  be  computed  by  minimizing  some  error  norm  of  the  forward 

residual  f  (n)  or  the  backward  residual  b  (n),  or  a 
m  m 

combination  of  the  two.  From  Fig.  1,  the  following 
relations  hold: 

fQ(n)  =  b0(n)  =  s  (n)  ,  (5a) 

fm+l(n)  -  V">  +  K„+1  bjn-l)  , 

bm+l(ni  =  Km+1  fm(n)  +  »„>-*>  • 


(5b) 

(5c) 


s(n)  is  the  input  signal  and  e(n)=fp(n)  is  the  output 
residual . 


Fig.  1.  Lattice  inverse  filter. 
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We  shall  <?ive  several  methods  for  the  determination  of 
the  reflection  coefficients.  These  methods  depend  on 
different  ways  of  correlating  the  forward  and  backward 
residuals.  Below,  we  shall  make  use  of  the  following 
definitions : 


Vn> 

V"> 


E[fm(n)bm(n-1))  , 


(6a) 

(6b) 

(6c) 


where  £(•)  denotes  expected  value.  The  left  hand  side  of 
each  of  the  equations  in  (6)  is  a  function  of  n  because  we 
are  making  the  general  assumption  that  the  signals  are 
nonstationary.  (Subscripts,  etc.,  will  be  dropped  sometimes 
for  convenience.) 


( a )  Forward  Method 

In  this  method  tne  reflection  coefficient  at  stage  n+1 
is  obtained  as  a  result  of  the  minimization  of  an  error  norm 
given  by  the  variance  (or  mean  square)  of  the  forward 
residual : 

1  m+1  ^  =  ^  [f^+1  (n)  ]  .  (7) 

By  substituting  (5b)  in  (7)  and  dif ferentiating  with  respect 
to  Km+1 ,  one  obtains: 
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E|£n,(nU.ni(n-l)l 


c  (n) 
in 


E [b  (n-1)  ] 
m 


This  method  of  computing  the  filter  parameters  is  similar  to 


the  autocorrelation  and  covariance  methods  in  that  the  mean 


squared  forward  error  is  minimized. 


(b)  Backward  Method 


In  this  case,  the  minimization  is  performed  on  the 


variance  of  the  backward  residual  at  stage  n+1.  From  (5c) 


and  (6b),  the  minimization  of  B  ^n)  leads  to: 


Rfm(n)bm(n-D) 


C  (n) 
m 


E[f>n 


Pm,n) 


Note  that,  since  Fn(n)  and  Bn(n-1)  are  both  nonnegative  and 
the  numerators  in  (8)  and  (9)  are  identical,  Kf  and  Kb 


always  have  the  same  sign  S: 


S  -  sign  Kf  =  sign  Kb  . 


C*  ml.  A/  Sik-JL.  »*  1  *•'  .Vj.'i.*  Jt.  V, .".V  .*  V  '  -  V.  >J.  r\  /■>  r_>  r  j.  ...  *  .»>. .  r  *  * 
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(c)  Geometric  Mean  Method  (Itakura) 


The  main  problem  in  the  above  two  techniques  is  that 
the  computed  reflection  coefficients  are  not  always 
guaranteed  to  be  less  than  1  in  magnitude,  i.e.,  the 
stability  of  H(z)  is  not  guaranteed.  One  solution  to  this 
problem  was  offered  by  Itakura  [3]  where  the  reflection 
coefficients  are  computed  from 


Elfm(n,bm(n-1>] 


/  Elf2 


E[f;(n)]E[b‘(n-l)J 

Cmln) _ 


Kn+i  is  the  negative  of  the  statistical  correlation  between 
frn(n)  and  bn ( n  —  1  ) ;  hence,  property  (4)  follows.  To  the 
author  s  knowledge,  (11)  cannot  be  derived  directly  by 
minimizing  some  error  criterion.  However,  from  (8),  (9)  and 

(11),  one  can  easily  show  that  K*  is  the  geometric  mean  of 

,  f  b 

K  and  K  : 


K1  =  S  Kb 


where  S  is  given  by  (10).  From  the  properties  of  the 
Geometric  mean,  it  follows  that: 


mini |Kf | , |Kb| ]  <  j  K1 J  <  max  f 


i*fUKb|]  . 


Now,  since  |K  I < 1 ,  it  follows  that  iT  the  magnitude  of 


k «.  -;r-vv.  -v  tv*’’*  'j* « ■  w*  7-  «v  .*  _«  •  . *  ,  >  *•  •  •  r  •  r-*  -  v  . 
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git  her  K_  PX  K_  is  greater  than  the  magnitude  of  the 

other  is  ne_c_essaril  y  less  than  U  This  leads  us  to  another 
definition  for  the  reflection  coefficient. 

( d  )  Minimum  Method 

KM  =  S  mini |Kf I , 1 Kb I ]  .  (14) 

f  b 

This  says  that,  at  each  stage,  compute  K  and  K  and  choose 
as  the  reflection  coefficient  the  one  with  the  smaller 
magnitude. 


( e )  General  Method 
n  M  I 

Between  K  and  K  there  are  an  infinity  of  values  that 

can  be  chosen  as  valid  reflection  coefficients  (i.e.,  |K|<1). 

These  can  be  conveniently  defined  by  taking  the  generalized 
f  b 


rth  mean  of  K  and  K 


r 

K  =  S 


4(|Kf|r  +  |Kb|r) 


1/r 


p  T 

As  r-^0,  K  K  ,  the  geometric  mean.  For 
guaranteed  to  satisfy  (4).  Therefore 
reflection  coefficient,  we  must  have  r<0. 


K°  =  K1,  K-"  =  ,<M  . 


(15) 

r>0,  Kr  cannot  be 
for  Kr  to  be  a 
In  particular: 


(16) 


f  u 

If  the  signal  is  stationary,  one  can  show  that  K  =K  ,  and 
that 

Kr  =  ,  all  r.  (Stationary  Case) 


(17) 
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KB' 


kZ, 


(f)  Harmonic  Mean  Method  (Burp) 


There  is  one  value  of 


for  which  K 


has  some 


-1 


interesting  properties,  and  that  is  r  =  -1.  K  ,  then,  would 


/*  l 

be  the  harmonic  mean  of  K1  and  Kn 


KB  =  K-1 


2KfKb 


2C  (n) 
m 


Kf  +  Kb 


Fm(n,+Bn.(n-1) 


(18) 


One  can  show  that 


K 


Mi 


<  K 


B, 


<  K‘ 


B 


In  fact,  Itakura  used  K  as  an  approximation  to  K' 
to  avoid  computinp  the  square  root. 


m 


(19) 

(11) 


B 


One  important  property  of  K  that  is  not  shared  by  K 


M  B 

and  K  ,  is  that  K  results  directly  from  the  minimization  of 


an  error  criterion.  The  error  is  defined  as  the  sum  of  the 
variances  of  the  forward  and  backward  residuals: 


"m+l(n)  -  Fm+l(n)  +  B. 


m+1 


(n) 


(20) 


Usinp  (5)  and  (6),  one  can  show  that  the  minimization  of 
(20)  indeed  leads  to  (  1 8 )  .  One  can  also  show  that  the 
forward  and  backward  minimum  errors  at  stape  m+1  are  related 
to  those  at  stare  m  by  the  followinp: 


Fm+l(n'  = 


,B 


X-(1W 


Bin+1  ^ 


,B 


1_ (Km+! ^ 


(21a) 

(21b) 


Vn-l) 


w-.v- i>yv- >  /-/-r- ^y-y-y-  ,vv. ,>w  ^v*v-vy\vr-. w .% .Tym 
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This  formulation  is  originally  due  to  Burp  [i|];  it  has  been 
used  recently  by  Boll  [5]  and  Atal  [6]. 

( a )  Discussion 

If  the  signal  s(n)  is  stationary,  all  the  methods 
described  above  give  the  same  result.  In  general,  the 
signal  cannot  be  assumed  to  be  stationary  and  the  different 
methods  will  give  different  results.  Which  method  to  choose 

in  a  particular  situation  is  not  clear  cut.  We  tend  to 

B 

prefer  the  use  of  K  in  ( 1 8 )  because  it  minimizes  a 
reasonable  and  well  defined  error  and  guarantees  stability 
simultaneously,  even  for  a  nonstat ionary  signal. 

3 •  The  Covariance-Lattice  Method 

If  linear  predictive  analysis  is  to  be  performed  on  a 
regular  computer,  the  number  of  computations  for  the  lattice 
methods  given  above  far  exceeds  that  of  the  autocorrelation 
and  covariance  methods  (see  the  first  row  of  Fig.  2).  This 
is  unfortunate  since,  otherwise,  lattice  methods  have 
superior  properties  when  compared  to  the  autocorrelation  and 
covariance  methods  (see  Fig.  3)*  Belov;,  we  derive  a  new 
method,  called  che  covariance-lattice  method,  which  has  all 
the  advantages  of  a  regular  lattice,  but  with  an  efficiency 
comparable  to  the  two  non-lattice  methods. 
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AUTOCORRELATION 

COVARIANCE 

REGULAR  LATTICE 

METHOD 

METHOD 

(WITH  RESIDUALS) 

TRADITIONAL 

METHODS 

pN  +  p2 

pN  +  -g-  p3  +  y  p2 

5  pN 

NEW  LATTICE 
METHODS 

pN  +  — p  +  —  p 

1  3  7 

PN+  — p  +  2p 

5pN 

Fig.  2.  Computational  cost  for  traditional  as 
compared  to  new  lattice  methods. 


LINEAR  PREDICTION 
METHOD 

advantages 

disadvantages 

AUTOCORRELATION 

1  THEORETICAL  stability 

2  COMPUTATIONALLY  efficient 

1.  WINDOWING 

2.  POSSIBLE  INSTABILITY 

WITH  FWL  COMPUTATION 

COVARIANCE 

1  .  NO  WINDOWING 

2  computationally  EFFICIENT 

1.  STABILITY  NOT 

GUARANTEED  EVEN  WITH 
FLOATING  point 

regular  lattice 

1.  WINDOWING  NOT  NECESSARY 

2  stability  can  BE  guaranteed 

3  NUMBER  OF  SAMPLES  FOR 

ANALYSIS  CAN  BE  REDUCED 

4  REFLECTION  coefficients  CAN  BE 

QUANTIZED  WITHIN  RECURSION 

1.  computationally 
EXPENSIVE 

covariance 

LATTICE 

1-4  SAME  AS  FOR  REGULAR 

LATTICE  METHOD 

5  COMPUTATIONALLY  efficient 

Fig.  3.  Comparison  between  different  LP  methods 
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From  the  recursive  relations  in  (3)  and  (5),  one  can 


show  that 


vn)  “ ,  4"')sin-k)  ■■ 


m 

b  (n)  -  >'  a,  m  s  (n-ml k )  . 

k=0  R 


(22a) 

(22b) 


Squaring  (22a)  and  taking  the  expected  value,  there  results 


m  m 


pm(")  -  >•  I  a‘">  a  <»>♦(*,})  , 


k=0  i=0 


whore  <|>(k,  i)  =  E  [ s (n-k ) s  (n-i ) ] 


is  the  nonstat ionary  autocorrelation  (or  covariance)  of  the 
signal  s(n).  (<|>(k,i)  in  (24)  is  technically  a  function  of 

n,  which  has  been  dropped  for  convenience.)  In  a  similar 
fashion  one  can  show  from  (22b),  with  n  replaced  by  n-1, 
that 

ni  ni 

m(n"1)  "  Jo  Jo  akm)ai(mh(m+1-k,m+1.i)(  (25) 

Cm(n)  =  I  £  a  (m)  (m)  (26) 

k=0  i=0  k  i  ^ (k,m+l-i)  . 


Given  the  covariance  of  the  signal,  the  reflection 
coefficient  at  stage  n+1  can  be  computed  from  (23),  (25)  and 
(26)  by  substituting  them  in  the  desired  formula  for 
The  name  "covariance-lc.ttice"  stems  from  the  fact  that  this 
is  basically  a  lattice  method  that  is  computed  from  the 
covariance  of  the  signal;  it  can  be  viewed  as  a  way  of 
stabilizing  the  covariance  method.  One  salient  feature  is 
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that  the  forward  and  backward  residuals  are  never  actually 


computed  in  this  method.  But  this  is  not  different  from  the 


non-lattice  methods. 


In  the  harmonic  mean  method  (18),  F  (n)  need  not  be 

m 


computed  from  (23);  one  can  use  (21a)  instead,  with  n 
replaced  by  m-1.  However,  one  must  use  (25)  to  compute 


Bm(n-1 ) ;  (21b)  cannot  be  used  because  Brn_1(n-2)  would  be 


needed  and  it  is  not  readilv  available. 


(a)  Stationary  Case 


For  a  stationary  signal,  ' ne  covariance  reduces  to  the 


autocorrelation: 


<Mk,i)  =  R(i-k)  =  R(k-i).  (Stationary] 


From  (23-27),  it  is  clear  that 


■-  •  1 1  • 


m  m 


and  C  = 


t  >:  a,(m>a<™> 


k=0  i  =  0 


^  aj  R (m+l-i-k) 


Making  use  of  the  normal  eauations  [1] 


E  R(i-k)  -  0,  lski-m  , 

i=0  ' 


and  of  (21),  one  can  show  that  the  stationary  reflection 


coefficient  is  riven  by: 


i  ■->  .  •  -  j.  .e  .» V  if  v-  •>  \m.'  i. .>  .VL*A  h  -•» "■> . jU.:  \ 
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K 

m+1 


F 

m 


m  ,  . 

Z  a}m  R (m+l-k) 
k=0  K 


(31) 


with  Fq=Ro«  (31)  is  exactly  the  equation  used  in  the 
autocorrelation  method. 


( b )  Quantization  of  Reflection  Coefficients 

One  of  the  features  of  lattice  methods  is  that  the 
quantization  of  the  reflection  coefficients  can  be 
accomplished  within  the  recursion,  i.e.,  Km  can  be  quantized 
before  Km+1  is  computed.  In  this  manner,  it  is  hoped  that 
some  of  the  effects  of  quantization  can  be  compensated  for. 

In  applying  the  covariance-lattice  procedure  to  the 

harmonic  mean  method,  one  must  be  careful  to  use  (23)  and 

not  (21a)  to  compute  Fn(n).  The  reason  is  that  (21a)  is 

B 

based  on  the  optimality  of  K  ,  which  would  no  longer  be  true 
after  quantization. 

Similar  reasoning  can  be  applied  to  the  autocorrelation 
method.  Those  who  have  tried  to  quantize  Km  inside  the 
recursion,  have  no  doubt  been  met  with  serious  difficulties. 
The  reason  is  that  (31)  assumes  the  optimality  of  the 
predictor  coefficients  at  ^tape  m,  which  no  longer  would  be 
true  if  Km  were  quantized.  The  solution  is  to  use  (28)  and 
(29),  which  make  no  assumptions  of  optimality.  Thus,  we 
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have  what  we  shall  call  the  autocorrelation  lattice  method . 

where  there  is  only  one  definition  of  Km+1: 

C 

K  H  -j  =  -  p~  ,  (Autocox' relation- Lattice)  (32) 

m 

where  Fn  and  Cm  are  piven  by  (28)  and  (29). 


4 .  Computational  Issues 
( a )  Simpli f ic ations 

Equations  (23), (25)  and  (26)  can  be  rewritten  to  reduce 

the  number  of  computations  by  about  one  half.  The  results 

^  (n)  anc*  E„,(n)+B  (n-1)  can  be  shown  to  be  as  follows: 
m  mm 


C  (n) 
in 


in 


<J>(  0 ,  m+.l )  +  >:  a,(lll)  L<M0  .ltit-l-k)  +<(,  (k.m+1)  1 

in  krl  (33) 

+  E  ]2<|.  (k,nn  i--k) 

k=l  K 

m-1  m 

k=l  i~-k-i-iak  ^ ai  ^  C*l’ (k,m+l-i) +(|,  (i,m+l-k)  ] 


Pm(n)+IVn“1)  =  *  (0,0)1.;.  (m+1) 

m 

+  2,^1ak  h'1  (0,k)+(jj  (m+l,m+l-k)  ] 


(34) 


m 


+  ffak  (k,k)+(j)  (m+l-k,m+l-k)  ] 

K—  1 

m-l  m  .  ,  , 

+  2  Z  y,  a(ra)a.(m) 

k=l  i=k+l  k  i 


[  ( k ,  i )  h  i)  ( m+ 1  -  k  ,  m+ 1  -  i )  ] 
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(28)  and  (29)  can  also  be  simplified  in  a  similar  fashion. 


( c )  Computational  Cost 

Fig.  2  shows  a  comparison  of  the  number  of  computations 
for  the  different  methods,  where  terms  of  order  p  have  been 
neglected.  The  increase  in  computation  for  the  covariance 
lattice  method  over  non-lattice  methods  is  not  significant 
if  N  is  large  compared  to  p,  which  is  usually  the  case. 
Furthermore,  in  the  covariance  lattice  method,  the  number  of 
signal  samples  can  be  reduced  to  about  half  that  used  in  the 
autocorrelation  method.  This,  not  only  recuces  the  number 
of  computations,  but  also  improves  the  ^pcotral 
representation  by  reducing  the  amount  of  averaging. 

5 .  Procedure 

Below  is  the  complete  algorithm  for  what  we  believe 
currently  to  be  the  best  overall  method  for  linear 
predictive  analysis.  It  comprises  the  harmonic  mean 
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definition  (18)  for  the  reflection  coefficients,  and  the 
covariance  lattice  method. 

(a)  Compute  the  covariances  <j>(k,i)  for  k ,  i  =  0 , 1 ,  .  .  .  ,  p . 

(b)  nfO. 

(c)  Compute  Cn(n)  and  Fn(n  )+Bri(n-1  )  from  (33)  and  (3*0,  or 
from  (23) , (25)  and  (26) . 

(d)  Compute  K  from  (18). 

n+  1 

(e)  Quantize  ^n+1  if  desired  (perhaps  usinp  lop  area 
ratios  [7]  or  some  other  technique). 

(f)  Usinp  (3),  compute  the  predictor  coefficients  {a^rn+1^} 

( m ) 

from  {a^  }  and  Kn+-|.  Use  the  quantized  value  if  Km+-| 

was  quantized  in  (d). 

(p)  m*-m+1  . 

(h)  If  m<p,  fo  to  (c);  otherwise  exit. 
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I.  INTRODUCTION 

This  note  provides  specifications  for  ARPA-LPC  speech 
compression  system  II,  an  update  of  the  present  system  I. 
The  approach  we  employed  in  arriving  at  these  specifications 
has  been  to  reap  maximum  benefit  for  the  least  amount  of 
effort.  Our  overall  design  objective  has  been  to  achieve 
average  continuous-speech  transmission  rates  of  about  2200 
bps.  With  the  use  of  a  silence  detection  algorithm,  these 
rates  may  be  expected  to  drop  to  about  1000  bps  or  less. 

The  following  sections  deal  with  only  those  aspects  of 
System  I  which  need  + ^  be  modified.  The  major  differences 
between  Systems  I  and  II  are  due  to: 

1.  Variable  frame  rate  (VFR)  transmission,  and 

2.  New  coding/decoding  tables  for  transmission 
parameters . 

Compared  to  System  I,  VFR  transmission  should  yield  a  lower 
(average)  frame  rate,  while  new  coding/decoding  tables 
employ  fewer  bits  per  transmitted  frame.  Thus,  both 
modifications  contribute  in  lowering  the  average  bit  rate. 

The  specific  recommendations  put  forth  in  this  note 
represent  a  first  cut  on  our  part .  Comments  and  suggestions 
are  we] come . 

In  the  preparation  of  this  note  we  have  had  discussions 
about  implementation  of  VFR  transmission  on  the  SPS-41  with 
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Earl  Craighill,  Danny  Cohen  and  Lynn  Cosell.  Joe  Tierney 
supplied  the  statistics  for  the  reflection  coefficients. 
Randy  Cole  explained  the  details  of  the  present  gain  table. 

II.  PARCELS,  PACKETS  AND  NEGOTIATIONS 

Since  the  discussion  of  the  proposed  VFR  transmission 
scheme  requires  the  knowledge  of  the  particular  parcel 
format  chosen,  we  first  consider  the  latter  issue  in  this 
section  along  with  related  issues,  sucn  as  packet  format  and 
negotiations  to  establish  a  voice  link  on  the  ARPANET. 

A .  Parcel  Format 

We  propose  that  a  variable-lenrth  parcel  be  transmitted 
for  every  analysis  frame .  A  parcel  has  a  3-bit  header: 
first  bit  is  1,  only  if  the  parcel  contains  pitch  data; 
similarly,  second  and  third  header  bits  indicate  if  the 
parcel  contains  codes  of  gain  and  reflection  coefficients 
(or  k-parameters )  ,  respectively,  A  parcel,  therefore,  can  be 
as  small  as  3  bits,  and  as  large  as  50  bits.  (The  breakdown 
of  the  possible  additional  47  bits  among  pitch,  gain  and 
reflection  coefficients  is  given  in  Section  IV.) 

The  parcel  format  Just  described  allows  the  t>*t*  of  a 
separate  transmission  criterion  for  each  of  the  three  groups 
of  analysis  parameters:  pitch,  gain,  and  reflection 
coefficients.  The  primary  reasons  for  proposing  this 
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independent  transmission  policy  are:  1)  It  is  the  most 
general  approach,  and  therefore  individual  variations  can  be 
implemented  with  relative  ease.  2)  In  general,  significant 
variations  in  each  of  the  three  parameter  groups  do  not 
occur  simultaneously.  Our  experience  with  low  average 
frame-rate  transmission  has  shown  that  if  pitch  and  gain  are 
transmitted  only  wher  reflection  coefficients  are 
transmitted,  perceivable  speech  quality  distortions  result 
[1]. 

We  have  considered  an  alternate  parcel  format  whereby  a 
parcel  of  data  is  transmitted,  not  for  every  analysis  frame, 
but  only  when  a  parameter  transmission  occurs.  This  means 
that  the  parcel  should  also  contain  a  code  to  specify  the 
interval  between  transmissions ,  which  is  variable  on  account 
of  VFR  transmission.  The  disadvantages  of  this  alternate 
format  are  as  follows.  First,  the  maximum  transmission 
interval  has  to  be  restricted  to  be  small  so  it  can  be  coded 
using  a  small  number  of  bits.  For  example,  a  code  length  of 
3  bits  means  that  the  transmission  interval  can  only  be  as 
loner  as  8  analysis  frames.  Secondly,  independent 
transmission  of  pitch,  gain  and  reflection  coefficients 
requires  the  transmission  of  3  separate  codes  corresponding 
to  the  3  independent  transmission  intervals.  For  the  range 
of  average  frame  rates  we  are  interested  in,  the  resulting 
parcel  overhead  is  more  than  the  overhead  required  by  the 
proposed  parcel  format.  These  reasons  justify  our  choice  of 
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the  simple  3-bit-headered  parcel  format  for  use  in 
System  II. 

B .  Packet  Format 

The  packet  header  details  are  the  same  as  discussed  in 
[2],  With  VFR  transmission,  we  suppest  the  use  of  a 
variabie-lenpth  packet  whereby  the  transmission  delay  (or 
packet  loading  time)  is  limited.  Our  recommendation  is  to 
limit  the  packet  size  such  that  the  packet  loadinp  time  is 
less  than,  say,  400  msec.  In  other  words,  a  packet  is 
transmitt cd  either  when  it  is  fully  loaded  with  an  inteper 
number  of  parcels,  or  when  the  total  speech  duration  it 
represents  is  about  400  msec,  whichever  happens  first. 

Since  the  proposed  parcel  format  does  not  restrict  the 
interval  between  two  successive  parameter  transmissions,  it 
can  happen  that  a  packet  is  full  of  parcels  having  header 
bits  only  (i.e.,  no  parcel  has  parameter  data  in  it).  This 
event  happens  usually  for  long  pauses  or  silence.  If  the 
silence  duration  exceeds  1  sec,  the  silence  detection 
algorithm  steps  in  to  send  a  silence  packet.  If  the 
duration  is  less  than  1  sec,  it  is  possible  to  have  even  two 
successive  packets  containing  header-only  parcels.  This 
poses  a  problem  if  the  receiver  performs  parameter 
interpolation  between  transmissions  inasmuch  as  the  receiver 
has  to  buffer  two  or  more  packets,  thus  producing  a  large 
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reconstitution  delay.  We  have  thought  of  a  number  of 
solutions  to  this  problem,  such  as  forcing  a  packet  to  have 
at  least  one  data  parcel.  The  following  solution  seems  to 
be  the  most  reasonable  one.  When  a  parameter  transmission 
interval  exceeds,  say,  100  msec,  then  the  last  transmitted 
parameter  values  are  used  for  the  duration.  (The  value, 
100  msec,  is  given  here  only  as  a  guide.  Other  reasonable 
values  may  be  used.)  Thus,  when  a  long  transmission  interval 
(less  than  1  sec)  is  encountered,  this  method  repeats  the 
last  transmitted  data  for  all  analysis  frames  in  the 
interval,  except  the  last  stretch  of  less  than  100  msec 
duration  for  which  interpolation  is  performed  to  generate 
the  parameter  data. 

C .  Negotiations 

We  suggest  an  update  of  the  present  NVP  program  to 
include  the  various  <WHAT>  and  <H0W>  negotiations  given  on 
pp.  6-7  in  [2].  This  recommendation  calls  for 
parameterization  of  analysis  and  synthesis  programs  in  terms 
of  variables  such  as  sample  period,  LPC  order,  and  samples 
per  parcel  (or  interframe  interval,  IPi).  For  sample  period 
=  150  microseconds,  IFI  may  be  either  9.6  msec  (64  samples) 
or  19.2  msec  (128  samples).  The  coding/decoding  tables 
given  in  Section  IV  constitute  table-set  2  for  the 
negotiation  item  10  on  p.  7  in  [2]. 
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experiment  with  different  transmission  criteria  by  changing 
the  transmitter  program  only,  without  having  to  worry  about 
the  receiver  programs  located  either  locally  (back-to-back 
mode)  or  remotely. 

As  mentioned  in  Section  II,  we  recommend  the  use  of 
separate  transmission  criteria  for  pitch,  gain  and 
reflection  coefficients.  Below  we  present  previously  tested 
transmission  criteria  for  reflection  coefficients,  and 
mention  possibilities  that  are  being  currently  investigated 
for  pitch  and  gain. 

A .  Reflection  Coefficients 

We  shall  consider  a  specific  transmission  criterion  for 
reflection  coefficients.  This  is  the  so-called  likelihood 
ratio  or  ratio  of  prediction  residual  energies  [3-5].  This 
VFR  scheme  transmits  the  reflection  coefficients  of  a  given 
analysis  frame  only  if  the  likelihood  ratio  computed  between 
that  frame  and  the  last  transmitted  frame  exceeds  a 
threshold,  denoted  by  LRT  (likelihood  ratio  threshold). 

To  compute  the  likelihood  ratio,  we  need  to  compute  for 
each  analysis  frame  the  autocorrelations  { b  ^ }  of  the 
predictor  coefficients  {a  }: 


M-i 

bi  =  j=0  aj  aj+i  '  ao  =  1  '  0  1  1  i  M  , 
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where  M  is  the  predictor  order.  The  analysis  program  should 

compute  these  M+1  autocorrelations  and  transfer  them  along 

with  the  already  available  preemphasized  speech 

autocorrelations  { R . }  and  minimum  residual  energy  a  to  the 

1  M 

transmitter  program  containing  the  VFR  scheme. 

Below  is  a  step-by-step  procedure  of  the  VFR 
transmission  scheme.  The  superscript  n  used  with  the 
quantities  b^,  R^  and  denotes  thexr  values  corresponding 
to  the  n-th  analysis  frame. 


(1)  Transmit  coefficients  of  frame  n 


(n) 


0  <  j  <  M 


(2)  i 
R. 


i  +  1 
(n+i) 


r: 

3 


,  0  <  j  <  M 


a 


a 


(n  +  i) 


M 


M 


bnR_  +  2  l  b.  R.  -  ct  LRT 
00  j=1  3  3  M 


(3)  If  D  £  0,  go  to  (2) .  (No  transmission) . 
( 4  )  n  4 —  n  +  i ,  go  to  ( 1 )  . 
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We  suggest  a  value  of  LRT=1.4  for  System  II. 

Earl  Crairhill  has  told  us  about  an  approximation 
(originally  suggested  by  Steve  Boll)  to  the  likelihood  ratio 
in  terns  of  reflection  coefficients  of  appropriate  analysis 

frames.  Since  the  performance  of  this  approximation  has  not 

« 

been  well  studied  and,  more  importantly,  since  the  direct 
computation  given  above  is,  according  to  Danny  Cohen,  within 
the  tine  constraints  of  existing  real-time  implementations, 
we  have  not  presented  the  details  of  the  approximation. 

Other  Suggestions 

We  have  investigated  two  modifications  of  the  above 
basic  likelihood  ratio  method  in  the  context  of  developing  a 
1000  bps  LPC  system  [1].  These  may  be  used  in  System  II  to 
improve  speech  quality  primarily. 

1.  The  first  modification  is  to  use  a  slightly  higher 
threshold  (about  5—10%  higher*)  for  unvoiced  sounds 
than  for  voiced  sounds.  When  n  transmission  interval 
contains  a  transition  between  voiced  and  unvoiced 
sounds,  the  lower  threshold  is  always  employed  to 
encourage  a  transmission. 

2.  The  second  modification  involves  the  use  of  a  double 


*These  percentage  figures  are  different  from  those  given  in 
[1]  because  there  we  used  logarithm  of  the  likelihood  ratio 
in  the  transmission  criterion. 
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threshold  strategy.  Two  likelihood  ratio  thresholds, 
LRT1  and  LRT2,  are  employed  in  this  scheme.  LRT2  may 
be  about  20$  higher#  than  LRT1  (e.g.  LRT 1=1.4  and 
LRT2=1.7).  The  idea  behind  this  modification  is  that 
if  the  likelihood  ratio  between  a  current  frame  and 
the  previously  transmitted  frame  exceeds  only  LRT1, 
and  not  LRT2,  then  the  current  frame  is  transmitted; 
if  it  exceeds  both  thresholds,  then  the  frame 
immediacely  preceding  the  current  frame  is 
transmitted.  The  latter  step  avoids  having  to  do 
parameter  interpolation  between  largely  different 
data  frames.  A  step-by-step  procedure  of  the 
modified  scheme  is  miven  in  the  next  page. 


*See  footnote  on  page  9. 
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(1)  Transmit  coefficients  of  frame  n 
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As  a  first  step,  we  recommend  implementing  the  basic 
likelihood  ratio  method.  Later,  one  may  want  to  try  out 
some  variations,  such  as  the  ones  discussed  above.  Such 
experimentation  may  be  facilitated  by  having  the  transmitter 
program  reside  in  a  computer  that  allows  the  program  changes 
to  be  done  relatively  easily  (e.g.  PDP-11  rather  than 
SPS-41 ) . 


B.  Pitch  and  Gain 


Currently,  we  are  investigating  transmission  criteria 
(separate  for  pitch  and  gain)  which  transmit  the  parameter 
if  it  has  changed  by  more  than  a  prespecified  amount  since 
the  last  transmission.  We  will  report  the  results  of  this 
work  in  a  later  NSC  note.  The  step-by-step  description  of  a 
typical  scheme  is  given  below,  where  T  denotes  a  preselected 
threshold.  (A  double  threshold  strategy  may  also  be  used 


here  as  well .  ) 


(1)  Transmit  value  at  frame  n 
i  < —  0. 
i  < —  i  +  1 


(2) 


D 


(frame  n+i  value)  -  (frame  n  value)  -  T 


(3) 

I f  D  <  0 ,  go  to 

(2) 

• 

(4) 

n  < —  n  +  i,  go 

to 
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For  now,  we  recommend  implementing  the  simple  method  of 
transmitting  pain  at  a  fixed  rate  of  every  19.2  msec,  and 
pitch  also  at  the  same  fixed  rate  except  during  an  unvoiced 
region  where  only  the  pitch  value  (=0)  of  the  first  unvoiced 
frame  is  transmitted;  the  receiver  continues  the  unvoiced 
status  until  a  new  pitch  value  is  received. 

IV.  CODING/ DECODI  I' 9  TABLES 

For  System  II,  wr  recommend  the  use  of  a  new  set  of 
coding/decoding  tables  for  transmission  parameters.  The 
gain  table  in  the  new  set  is  the  same  as  that  given  in  NSC 

Note  68  [2]  except  for  a  suggestion  of  using  a  nonzero 

decoded  value  for  the  zero  level.  The  pitch  table  has 
been  designed  in  such  a  way  that  decoded  values  are  unique 
(or  unequal)  thus  employing  the  available  Quantization 
levels  more  efficiently  [6].  Tables  for  reflection 
coefficients,  on  the  other  hand,  have  been  designed  to 
employ  fewer  total  number  of  bits  than  what  the  tables  of 
System  I  require.  The  resulting  bit  savings  (about  20 
bits/transmitted  frame)  are  due  to:  1)  the  use  of  smaller 
parameter  ranges  obtained  from  real  speech  data,  2)  the 
efficient  selection  of  step  sizes  for  the  different 
parameters  (log  area  ratios  or  LARs)  based  on  the  spectral 

sensitivity  concept  [1],  and  3)  the  LPC  order  M  being  9 

instead  of  10.  As  an  important  consequence,  a  different 
table  is  proposed  for  e'ch  reflection  coefficient  [1]. 
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A .  Bit  Allocation 

The  new  quantization  tables  given  below  are  based  on 
the  following  bit  allocation:  pitch  =  6  bits;  gain  =  5  bits; 
9  reflection  coefficients  k ( 1 )  to  k(9),  in  that  order  =  5, 
5,  5,  4 ,  4,  4,  3,  3,  3  bits.  Thus,  a  transmitted  frame  of 
data  (parcel)  has  a  maximum  of  47  data  bits  (plus  3  header 
bits ) . 

Our  feeling  is  that  a  9-th  order  LPC  analysis  is 
adequate  for  a  sampling  rate  of  6.Y  kHz.  However,  if  one 
wants  to  have  H=10,  we  suggest  duplicating  the 
coding/decoding  table  of  the  9-th  coefficient  to  be  used  for 
the  10-th. 


B.  General  Comments  About  Quantization  Tables 

Pitch  and  gain  tables  given  in  the  following  pages  are 
arranged  in  three  columns  ''X(J)",  "J"  and  "R(J)",  while  the 
tables  for  the  reflection  coefficients  have  two  additional 
columns  "INDHX(J)"  and  "INDEXP ( J ) " .  (These  two  columns  are 
explained  later.)  Notice  that  the  entries  in  the  first 
column  "X(J")  are  half  a  step  off  the  other  columns.  This 
is  to  indicate  that  intervals  from  the  X-domain  (pitch, 
gain,  and  the  reflectior  coefficients)  are  mtpped  into  codes 
or  levels  "J",  which  are  transmitted  over  the  network,  to  be 
translated  by  the  receiver  into  the  values  in  the  column 
"R(J)".  These  intervals  are  open-close  intervals  as  defined 
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in  [2].  Values  of  a  parameter  above  and  below  the  range  of 
the  " X ( J ) "  column  are  mapped  into  the  maximum  and  minimum 
entries  of  the  "J"  column. 

C.  Pitch  Table 

The  pitch  table  Riven  here  is  the  "optimal"  solution 
presented  in  NSC  Note  49  [6].  Briefly,  the  logarithm  of  the 
pitch  period  in  number  of  samples  was  quantized.  A 
difficulty  arises  in  attempting  to  quantize  the  log  pitch  in 
that  at  the  high  frequency  end  (small  pitch  period)  of  the 
range  of  interest,  the  quantization  bin  size,  as  found  by 
dividing  the  log  pitch  scale  into  equal  segments,  can  be  so 
small  as  to  result  in  cases  where  two  distinct  quantization 
bins  yield  the  same  decoded  value,  thus  wasting  some 
quantization  levels.  We  used  a  method,  for  deriving  the 
pitch  coding  and  decoding  tables,  which  ensures  maximum 
usage  of  all  the  available  quantization  levels  [6]. 

The  scaling  of  the  pitch  value  obtained  from  SIFT 
program  is  the  same  as  before.  (Scale  up  by  shifting  9 
places  to  the  left,  i.e.,  multiplying  by  512.  Since  NSC 
Note  42  has  not  been  issued  yet,  the  only  reference  for  this 
scaling  seems  to  be  NSC  Note  36  [7].) 

The  level  J=C  defines  the  unvoiced  condition.  The 
receiver  decodes  it  as  the  interframe  interval  (TFJ) 
expressed  in  number  of  samples.  As  we 


recommended  in 
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NEW  PITCH  TABLE 


X(J) 

0 

0 

3840 

4011 

4182 

4352 

4523 

4694 

4864 

5035 

5206 

5376 

5547 

5718 

5888 

6059 

6230 

6400 

6571 

6742 

6912 

7083 

7254 

*This 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 


R(J) 


64* 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 


X(J) 

7254 

7424 

7595 

7764 

7942 

8085 

8362 

8641 

8789 

8940 

9213 

9502 

9613 

9906 

10154 

10410 

10669 

10919 

11188 

11404 

11806 

12031 


R(J> 


22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 


40 

41 

42 

43 

44 

45 

47 

48 

49 

50 

52 

53 

54 

56 

57 

59 

60 
62 
63 
65 
67 


X(J) 

12031 

12265 

12636 

12969 

13313 

13654 

13995 

14336 

14678 

15018 

15366 

15680 

16126 

16583 

16874 

17301 

17862 

18261 

18667 

19201 

19733 

Infinity 


43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 
63 


R(J) 


68 

70 

72 

74 

76 

78 

80 

82 

84 

86 

88 

90 

93 

95 

97 

100 

103 

105 

108 

111 

114 


value  is  the  interframe  interval  in  number  of  samples 
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Section  II,  IFI  is  a  variable  whose  value  is  decided  at  the 
time  of  the  negotiations.  The  pitch  table  gives  a  decoded 
value  of  64  for  J=0,  assuming  IFI=9.6  msec.  For  any  other 
value  of  IFI,  this  decoded  value  has  to  be  chanped. 

D .  Gain  Table 

This  is  the  same  main  table  as  piven  in  NSC  Note  68 

[2].  The  "X(J)"  column  is  the  square  root  of  the  energy  (or 

the  zero-lap  autocorrelation  Rq)  of  the  preemphasized  and 

windowed  speech  sipnal.  The  pain  table  assumes  a  maximum 

X-value  of  3000  and  allows  for  a  dynamic  range  of  about 

43.5  dB.  (With  a  12-bit  A/D  input  (including  the  sign  bit) 

and  with  128  samples  in  the  analysis  interval,  R^  is  assumed 

2  3 

to  have  a  maximum  value  of  about  2  after  accounting  for  a 

6  dB  ( 1  bit)  difference  between  oeak  and  rms  values  of 

speech  [7]  and  a  combined  loss  of  about  12  dB  (2  bits)  due 

/  23 

to  preemphasis  and  windowing.  Notice  that  /2  is  about 

3000.  These  numbers  were  supplied  to  us  by  Randy  Cole. 
Since  they  are  not  piven  in  [2],  we  have  included  them  in 
this  note .  ) 

Our  experience  has  shown  that  using  R,  =0  for  the  zeroth 

u 

level  can  cause  perceivable  problems  in  the  synthesized 
speec  [1].  These  problems  arise  due  to:  1)  certain  very 
low  energy  speech  sections  (e.g.  beminninms  of  [h],  [n], 
[d])  beinm  somewhat  cutoff  in  the  synthesized  version,  and 
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GAIN  (/R^l  TABLE 
(Taken  from  NSC  Note  68) 


X(J) 

J 

R(J) 

X(J) 

J 

R(J) 

0 

225 

0 

0* 

16 

245 

20 

266 

1 

20 

17 

289 

22 

315 

2 

24 

18 

342 

26 

372 

3 

28 

19 

404 

30 

439 

4 

33 

20 

478 

36 

519 

5 

39 

21 

565 

42 

614 

6 

46 

22 

667 

50 

725 

7 

54 

23 

789 

59 

857 

8 

64 

24 

932 

70 

1013 

9 

76 

25 

1101 

83 

1197 

10 

90 

26 

1301 

98 

1415 

11 

106 

27 

1538 

116 

1672 

12 

126 

28 

1818 

137 

1976 

13 

148 

29 

2148 

161 

2335 

14 

175 

30 

2539 

191 

2760 

15 

207 

31 

3000 

225 

Infinity 

*We  recommend  the  use  of  a  nonzero  number  such  as  15(-46dB)  or  10 
(-50dB)  for  this  decoded  value. 
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2)  having  to  listen  to  the  contrast  between  absolute  silence 
and  the  usually  noisy  synthesized  speech.  These  problems 
generally  disappear  if  we  use  a  relatively  small  nonzero 
energy  for  the  level  J=0.  Therefore,  we  recommend  decodinp 
this  level  as  a  small  value  such  as  15  (about  46  dB  lower 
than  the  maximum  value  of  3000)  or  10  (about  50  dB  lower 
than  the  maximum). 

E .  Tables  for  Reflection  Coefficients 


The  9 

codinp/decoding  tables  given, 

one 

for  each 

coefficient , 

represent 

linear  quantization 

of 

log  area 

ratios  with 

a  different 

step  size  for  each  coefficient  [1]. 

The  scalinp  of  the  transmitter  table  values  is  the  same  as 
in  [2],  In  other  words,  the  "X(J)"  column  of  the  table  for 
the  i-th  reflection  coefficient  k.  has  entries  of  the  form 

l 

k^2^.  The  receiver  table  ”R(J)”  gives  the  decoded  values 

of  the  reflection  coefficients  in  the  same  scaled  form.  The 

column  "INDEX(J)”  pives  the  indices  into  the  SPS  sine  table 

correspondinp  to  the  decoded  values  i.e.,  these  entries  are 

15 

of  the  form  arcsin(k^)  2  /it  .  These  entries  refer  to  the 
"fine”  SPS  sine  table,  which  calls  for  additional 
multiplications,  thus  increasinp  the  computational  time. 
The  entries  in  the  "INDEXP(J)”  column,  on  the  other  hand, 
are  indices  into  the  "coarse”  sine  table  only,  thus 
requiring  no  such  multiplications;  these  indices,  being 

7 

integer  multiples  of  2  ,  are  the  closest  approximations  to 
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the  corresponding  ones  in  the  "INDEX(J)"  column.  (It  is 

7 

important  to  note  that  we  have  factored  2  out  of  the 
entries  in  the  last  column.) 


As  mentioned  in  the  beginning  of  this  section,  in 
deriving  these  tables  we  have  used  ranges  of  reflection 
coefficients  obtained  from  real  speech  data  and  a  bit 
allocation  based  upon  the  spectral  sensitivity  properties  of 
the  LARs.  (These  ranges  were  obtained  for  6.7  kHz  sampled 
speech  by  Lincoln  Labs.)  Each  cable  lists  at  the  top  the 
minimum  and  maximum  values  of  the  corresponding  reflection 
coefficient,  number  of  bits,  and  the  corresponding  LAR  step 
size  in  dB.  We  have  perturbed  the  minimum  and  maximum 
values  supplied  by  Lincoln  Labs  a  little  so  that  a  zero  LAR 
(or  equivalently  a  zero  reflection  coefficient)  is  quantized 
with  no  error.  (Refer  to  [8]  for  details.) 


The  tables  are  asymmetric  (unlike  the  tables  in  [2]) 
insofar  as  the  assumed  minimum  value  of  any  reflection 
coefficient  is  not  equal  to  the  negative  of  its  assumed 
maximum  value. 
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TABLE  FOP  REFLECTION  COEFFICTENT  FI 


VALUE*  .0 

.960, 

MAX  VALUE* 

0,383,  NO, 

OK 

HITS" 

LOG 

ApEA 

PATIO  STEF 

SIZE  s  0,636 

DR 

J 

P(J) 

index (j) 

JNDEXPCJ) 

( 

X  2##7) 

-31446 

0 

•31348 

•13302 

-104 

*31243 

1 

•31130 

•13072 

•102 

-3 1  OOP 

2 

•30878 

•12825 

•100 

•30739 

3 

*30590 

-12560 

•  98 

•30430 

4 

•3026O 

-12276 

•96 

•30077 

5 

•29881 

•11973 

•94 

•29672 

6 

•29449 

•11649 

•91 

•29210 

7 

•28955 

•11302 

•  88 

•29663 

8 

•28394 

•10933 

•  85 

•26065 

9 

•27756 

•10539 

-82 

-27406 

10 

•27034 

•10120 

•  79 

•26639 

11 

•26220 

•9675 

•76 

•25775 

12 

•25304 

•9203 

•72 

•24805 

1  3 

•24278 

•8704 

•  68 

•23722 

14 

•23136 

•8176 

•  64 

-22516 

15 

•21868 

•7621 

•  60 

•21166 

16 

•20471 

*7038 

•55 

•19722 

17 

•18939 

*6428 

•  50 

•18123 

18 

•17273 

*5791 

*45 

-16389 

19 

•15473 

•5129 

•  40 

•14524 

20 

•13544 

•4444 

•35 

•1 2534 

21 

•11495 

*3738 

•29 

•10429 

22 

•  9338 

•3014 

•24 

•8224 

BBN  Report  No.  3263 


Bolt  Beranek  and  Newman  Inc. 


( TABLE  FOP  K 1  CONTINUED) 


X(J) 

J 

R(J) 

iNDEXf d) 

INDEXPf  .1) 
C  X  2**7 ) 

•8224 

•5936 

23 

•  7089 

•2274 

•18 

•1587 

24 

•4768 

•1523 

•12 

•1200 

25 

•2397 

•  764 

•  6 

1200 

26 

0 

0 

0 

3587 

27 

2397 

764 

6 

5936 

28 

4768 

1523 

12 

8224 

29 

7089 

2274 

18 

10429 

30 

9338 

301  4 

24 

12534 

31 

11495 

3738 

29 

I 


m 


- 
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TART.f 

m 

FOR 

REFLECTION 

COEFFICIENT  *2 

M  MIN  VALUE*  *0 

.449 

,  MAX  VALUE#  0.956,  NO, 

OF  PITS* 

H  LOG 

AREA 

RATIO  STEP 

SIZE  e  0,646 

DR 

gj  XCJ) 

PH  -14718 

<1 

PM) 

JNDEX(J) 

INDEXP(J) 
(  X  2##7) 

r  V 

*12709 

0 

•13729 

•4509 

•35 

K2J 

'.v1  *10580 

1 

•11658 

•3794 

*30 

-8346 

2 

•  9475 

•3060 

•24 

SI 

Em  -602ft 

1 

•7196 

•2309 

•18 

n  -364? 

4 

•  4841 

•1547 

*12 

"  -1219 

5 

•2434 

•  775 

•  6 

r.\* 

1219 

6 

0 

0 

0 

364? 

7 

2434 

775 

6 

Ik  60?6 

8 

4841 

1547 

12 

t\%  8  346 

9 

7196 

2309 

18 

10580 

10 

9475 

3060 

24 

9  12709 

11 

1  1658 

3794 

30 

S?3 

14718 

12 

1  3729 

4509 

35 

m 

OS  16598 

13 

15675 

5203 

41 

1R34? 

14 

17488 

5872 

46 

LVl  19947 

15 

i  9162 

6515 

51 

jS  21412 

16 

20697 

7130 

56 

k>* 

22742 

17 

22094 

7718 

60 

*  \V 

t'S  2  394? 

18 

23358 

8277 

65 

fV-  2501  8 

19 

24495 

8807 

69 

“  25979 

20 

25512 

9308 

7  3 

jjjjj  2  6  ‘J  5  / 

21 

26418 

9781 

76 

27588 

c 

2? 

27222 

-23 

10226 

80 
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(TABLE  FOB  K2  CONTINUED) 


X(d) 

J 

PM) 

INUEX(J) 

INDEXPM) 
(  X  2**7) 

27588 

21 

27932 

10645 

83 

28255 

24 

28558 

11038 

86 

28842 

25 

29108 

11407 

89 

29156 

26 

29589 

11752 

92 

29807 

27 

30010 

12074 

94 

10200 

28 

30378 

12175 

97 

10541 

29 

30698 

12656 

99 

30842 

30 

30976 

12919 

101 

31101 

31 

31218 

13163 

M3 

11 127 
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S:- 

TARLF 

FOP 

REFLECTION  COEFFICIENT  K3 

i  MIN  VALUE-  *0 

.911 

*  MAX  VALUE- 
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OF  PITS- 

B 

1  LOG 
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DP 

3  X(j) 

J 
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INDEX(J) 

INDEXPU) 

t  X  2**7) 

■ 

•  *  . 

j  -29856 

p .< 

w  *  , 

0 
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•  92 

•29410 

l 
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•8945 
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0 
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•21 497 

-y 

•20030 

10 

*20781 
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•56 

It 

•19?45 

•6547 

•51 

J 
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•17568 
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•16677 
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-9526 

•3077 
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19 
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20 

0 

0 

0 
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22 
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12 
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( TABLE  F OP  K1  CONTINUED) 


X(J) 

6060 

J 

P(J) 

lNDFX(J) 

INDEXP(J) 
(  X  2#*7) 

8392 

23 

7236 

2322 

18 

10636 

24 

9526 

3077 

24 

12774 

25 

11720 

3815 

30 

14791 

26 

13799 

4534 

35 

1  ft  6  7  7 

27 

15751 

5230 

41 

18424 

28 

17568 

5902 

46 

2P030 

29 

19245 

6547 

51 

21477 

30 

20781 

7165 

56 

22826 

31 

22178 

7754 

61 
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TAPI -r  FOR  PEFLFCTION  COEFFICIENT  K4 

MIN  VALUE*  -0,315#  MAX  VALUE*  0,822#  NO,  OF  BITS*  4 
LOG  AREA  RATIO  STEP  SIZE  ■  0,808  DB 


XIJ) 

d 

PI  J) 

INDEX! J) 

JNPEXPU) 
I  X  2**7) 

•10308 
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0 

-8915 

•2874 

•22 

-4543 

1 
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*1523 

2 
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-969 

-8 
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3 

0 

0 
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4 
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8 
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5 
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15 
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6 
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22 
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7 
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30 
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8 

1  42  3',i 

4686 

37 
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9 
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43 
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10 
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50 
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11 
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56 
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12 
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61 
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13 
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67 
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14 
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9187 

72 

26934 

15 

26421 

9782 
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TABLE 

FOP 

REFLECTION 

COEFFICIENT  K5 

VALUF*  -0 
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TA51F.  FOR  REFLECTION  COEFFICIENT  16 


VALUFs  •( 
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MAX  VALUE* 
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APFA 

RATIO  STEP  5 
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TAPlf  FOP  RFFIFCTION  COEFFICIENT  n 
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TABLE  FOR  REFLECTION  COEFFICIENT  K8 
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•1226 

•It) 
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19 
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37 

15907 

7 

1  7  339 

5R16 

45 

18685 

-  31- 


Jo.  3263 

Bolt 

Beranek  and 

Newman  Inc . 

TARLF 
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V.  INTERPOLATION  AND  SYNTHESIS 

We  suggest  that  at  the  receiver  the  decoded  parameters 
be  (linearly)  interpolated  between  parameter  receptions. 
When  the  transmission  interval  for  a  parameter  exceeds 
100  msec,  it  is  recommended  that  the  last  received  parameter 
value  be  repeated.  This  issue  was  discussed  in  more  detail 
in  Section  II-B. 

For  the  implementations  using  the  SPS-4 1 /PDP- 1 1  system, 
programs  may  be  written  for  the  PDP-11  to  supply  to  the 
SPS-41  interpolated  parameters  at  intervals  of  IFI  msec, 
which  the  SPS-41  can  further  interpolate  to  update  the 
synthesizer  parameters  at  smaller  intervals  (e.g.  every 
4 . 8  msec  ) . 

VI.  SUMMARY  OF  SPECIFICATIONS 

Analysis  is  done  every  IFI=9.6  nsec.  An  LPC  order  of 
M=9  is  recommended.  VFR  transmission  of  reflection 
coefficients  is  accomplished  using  the  basic  likelihood 
ratio  method,  where  the  threshold  LRT=1.4.  Pitch  and  gain 
are  transmitted  at  a  fixed  rate  of  every  19.2  msec.  Durinr 
an  unvoiced  region,  only  the  first  pitch  value  (=0)  is 
transmitted.  New  codinr/decodinr  tables  are  employed  to 
quantize  pitch,  rain  and  reflection  coefficients  with  6,  5 
and  36  bits  respectively.  A  parcel  of  data  bits  with  a 
3-bit  header  is  transmitted  every  9.6  msec.  A 
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variable-lenpth  packet  represent inp  a  maximum  speech 
duration  of  400  msec  is  recommended.  Parameter 
interpolation  between  transmissions  is  sumpested. 

For  the  specified  VFR  transmission  scheme,  the  averape 
frame  rate  for  reflection  coefficients  is  about  37 
frames/sec;  that  for  pain  is  52  franes/sec;  that  for  pitch 
is  less  than  about  40  f.'ames/sec.  A  reasonable  estimate  of 
the  averape  frame  rate  for  all  the  transmission  parameters 
is  about  40  frames/sec.  This  corresponds  to  a  data  rate  of 
^0 ( 5+6+36 )  =  1880  bps.  The  bit  rate  due  to  the  3-bit  parcel 
overhead  is  104x3=312  bps.  Thus,  we  estimate  the  averape 
bit  rate  to  be  on  the  order  of  2200  bps  for  continuous 
speech.  Explicit  silence  detection  as  beinp  done  in 
System  I  is  expected  to  drop  this  rate  to  about  1000  bps  or 
less  depending  upon  the  proportion  of  silence  relative  to 
speech . 

VII.  OTHER  GENERAL  RECOMMENDATIONS 

A .  Gain  Implementation 

We  recommend  imple^entinr  the  speech  sirnal  enerpy  as  a 
pain  multiplier  at  the  input  of  the  synthesizer  filter. 
With  the  in  multiplier  placed  at  the  output  of  the  filter, 
perceivable  distortions  are  produced  in  the  synthesized 
speech  at  places  where  relatively  larpe  frame-to-frame 
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energy  chanpes  occur  [1].  (There  are,  however,  adhoc 
solutions  to  this  problem). 


As  mentioned  in  the  introduction,  our  objective  in 
cominp  up  with  specifications  for  System  II  has  been  to 
procure  maximum  benefit  with  minimum  effort.  In  keepinp 
with  this  objective,  we  left  out  the  bit-saving  techniques: 
variable  order  linear  prediction,  Huffman  or  other 
(suboptimal)  fancy  encodirg  (e.p.  delta  coding  of  pitch  or 
pain)  [4]  and  the  optimal  linear  interpolation  scheme  which 
holds  potential  for  improving  speech  quality  especially  with 
VFR  transmission  [9].  We  suggest  that  these  techniques,  and 
perhaps  others  as  well,  be  considered  for  a  future 
System  III. 
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1 .  INTRODUCTION 

So  far,  the  decision  on  how  much  speech  a  packet  should 
contain  for  transmission  over  the  ARPA  net  has  been  influenced 
hy  two  main  factors:  overhead,  and  delay.  In  the  present 
implementation,  each  packet  contains  a  maximum  of  1007  data 
bits,  of  which  about  32  are  needed  for  overhead.  An 
additional  200  bits  of  overhead  (not  included  in  the  1007)  are 
added  by  the  IMP.  The  speech  data  “'nsists  of  67  bit  parcels, 
eacn  of  which  encodes  19-2  msec  of  speech.  (These  values  may 
change  in  future  systems).  The  more  parcels  a  packet 


contains , 

the  smaller  the 

percentage  of 

bits 

"wasted" 

in 

overhead . 

This  factor  argues 

for  maximizing 

the 

number 

of 

parcels 

in  each  packet . 

On  the  other  hand 

,  increasing 

the 

number  of  parcels  per  packet  increases  the  duration  of  speech 
encoded  in  the  packet.  Since  the  first  parcel  in  the  packet 
cannot  be  transmitted  until  the  last  parcel  in  the  same  packat 
has  been  encoded,  a  delay  is  unavoidably  introduced,  equal  to 
the  duration  of  speech  encoded  in  a  packet.  This  delay  is  in 
addition  to  delays  due  to  other  factors  such  as  finite 
transmission  time,  path  length,  and  network  response.  Delays 
have  a  serious  disruptive  effect  on  conversation  (Riesz  and 
Klemmer,  1966;  Brady,  1971),  and  this  argues  for  minimizing 
the  duration  of  speech  in  a  packet.  Experiments  have  been 
performed  with  two  choices  of  speech  duration  per  package. 
ISI  has  used  the  maximum  number  of  parcels  per  packet  (14) 
corresponding  to  268.8  msec  of  speech,  yielding  an  overhead 
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rate  of  17. 5% -  Lincoln  Labs,  on  the  other  hand,  has  used  up 
to  7  parcels  per  packet,  corresponding  to  134.4  msec  of 
speech,  and  an  overhead  of  29. 8$. 

The  purpose  of  this  note  is  to  argue  that  a  third  factor 
needs  to  be  considered  in  deciding  how  much  speech  should  be 
encoded  in  one  packet  -  the  effect  of  lost  packets  on 
intelligibility.  We  propose  a  method  of  packetizing  speech 
parcels  which  will  sharply  reduce  the  effect  on  speech 
intelligibility  of  lost  (delayed)  packets. 

2_jl  the  PROBLEM 

Whenever  an  utterance  is  longer  than  the  typical 
processing  and  transmission  delays,  reconsti  ution  of  the 
waveform  begins  at  the  destination  before  the  message  ends  at 
the  transmitter.  Since  packets  must  be  reconstituted  in  the 
correct  sequence,  and  the  sequence  has  already  begun,  a 
problem  arises  whenever  a  packet  is  delayed.  Two  solutions 
have  been  tried.  Lincoln  Labs  has  chosen  to  proceed  without 
the  late  packet,  replacing  the  speech  in  the  late  packet  by  an 
equal  amount  of  silence.  This  solution  discards  some  of  the 
speech  waveform,  but  retains  the  overall  temporal  pattern  of 
the  speech.  ISI  has  chosen  to  w ait  for  the  late  packet,  thus 
introducing  a  silence  equal  to  the  delay  between  the  expected 
and  actual  arrival  times  of  the  delayed  racket  (a  variable). 
This  solution  does  not  discard  any  of  thi.  speech  waveform,  but 
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the  overall  temporal  pattern  of  the  utterance  may  be 
disturbed.  As  network  traffic  becomes  heavier,  the 
interruptions  introduced  into  the  speech  by  the  former 
solution,  and  the  long  delays  introduced  by  the  latter,  become 
increasingly  objectionable. 

At  the  ARPA  Review  meeting  in  Reston,  Virginia,  December 
15-16,  1975,  Jim  Forgie  played  some  packet-speech  that  had 
been  sent  over  the  ARPANET,  for  a  variety  of  packet  loss  rates 
ranging  from  30 %  to  values  close  to  zero.  Speech 
intelligibility  was  severely  affected  by  30$  loss  rates,  and 
substantially  affected  by  loss  rates  of  a  few  percent. 
Earlier  work  on  the  degradation  of  intelligibility  as  a  result 
of  interruoting  speech  (Huggins,  1964),  or  introducing  silent 
intervals  into  it  (Huggins,  1975a),  has  shown  that  the 
degradation  is  critically  dependent  on  the  durrtion  of  the 
resulting  silent  intervals.  The  most  severe  degradation 
occurred  when  the  silent  intervals  lasted  100-300  msec,  but 
intelligibility  was  much  less  affected  by  shorter  silent 
intervals.  Thus  it  appears  that  the  present  choice  of  speech 
duration  per  packet  leads  to  silent  intervals  (due  to  lost 
packets)  that  fall  in  the  range  that  maximally  degrade 
intelligibility.  We  summarize  the  earlier  work  below,  before 
proposing  a  renedy,  and  tests  to  validate  it. 


'  v  i  -s  .  r\TTT:  tt  tt.  r.  v-ttt-:  r'T:-rrr:  * — -,T.  T  ir '  m  »r  /  1  ,,'r7T..’  v"  1  v  -• 
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2 . 1  Interrupted  Speech 


The  stimulus  materials  in  both  the  earlier  studies  were 
continuous  speech,  consisting  of  readings  from  a  book  of 
scientific  essays.  Intelligibility  was  measured  by  the  number 
of  words  in  100-word  passages  that  listeners  were  able  to 
repeat  correctly  in  a  shadowing  task,  where  the  listener 
repeats  aloud,  word  for  word,  what  he  hears.  Subjects  were 
run  individually.  The  stimulus  tapes  for  the  interrupted 
speech  experiments  were  generated  by  switching  the  continuous 
speech  message  backwards  and  forwards  between  two  tape 
recorders  at  a  regular  rate,  so  that  the  signal  deleted  by  an 
interruption  on  one  tape  always  appeared  on  the  other  tape. 
The  two  interrupted  tapes  thus  produced  were  therefore 
complementary.  Switching  rates  varied  between  one-fifth  and 
sixteen  complete  cycles  of  alternation  per  second,  and  the 
speech-silence  ratio  was  equal  to  1.0  on  each  tape.  Thus, 
silent  intervals  (and  speech  intervals)  ranged  in  duration 
from  2500  msec  down  to  31  msec  on  each  tape.  Twenty  subjects 
each  shadowed  one  of  the  two  tapes.  At  the  slowest  switching 
rate,  subjects  heard  half  the  phrases,  and  intelligibility  was 
about  5055.  As  the  rate  was  increased,  intelligibility  first 
declined  to  a  minimum  of  13.z20J,  with  speech  and  silent 
intervals  between  300  and  100  msec,  and  then  improved  rapidly 
to  80%  with  silent  intervals  of  31  msec.  (See  Fig.  1).  Thus, 
intelligibility  was  most  degraded  when  speech  and  silent 
intervals  lasted  100-300  msec,  but  was  little  affected  when 
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speech  and  silent  intervals  were  shortened  to  31  msec,  even 
though  50$  of  the  speech  was  missing. 

2 . 2  Temporally  Segmented  Speech . 

The  temporally  segmented  speech  experiments  differed  from 
the  interrupted  speech  experiments  only  in  that  no  speech  was 
discarded  (Huggins,  1975a).  Instead,  the  continuous  speech 
message  was  broken  up  into  "speech  intervals"  by  the  Insertion 
of  silent  intervals.  Similar  effects  could  be  obtained  by 
repeatedly  starting  and  stopping  a  tape  recorder,  if  the 
transport  mechanism  had  no  inertia.  The  durations  of  speech 
and  silent  intervals  were  varied  independently.  The  results 
show  that,  with  silent  intervals  held  constant  at  200  msec, 
intelligibility  declined  from  95$  to  less  than  20$  as  speech 
interval  duration  was  decreased  from  200  msec  to  30  msec. 
(See  Fig.  2,  Curve  A).  On  the  other  hand,  with  speech 
intervals  held  constant  at  63  msec,  intelligibility  remained 
low  (about  50$,  the  level  depending  only  on  speech  interval 
duration)  as  silent  intervals  were  shortened  from  500  msec  to 
125  msec,  then  suddenly  and  rapidly  recovered  as  silent 
intervals  were  reduced  from  125  to  63  msec.  At  63  msec  or 
below,  intelligibility  was  close  to  100$  (See  Fig.  2,  Curve 


These  results  strongly  support  the  hypothesis  that  the 
V-shaped  minimum  of  intelligibility  found  in  a  variety  of 
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experiments  of  this  sort,  of  which  Figure  1  is  an  example,  is 
produced  by  the  overlap  of  two  separate  effects.  The  decline 
of  intelligibility  as  speech  and  silent  interval  durations  are 
shortened  towards  100  msec  is  due  to  the  decreasing  amount  of 
information  in  the  speech  intervals,  together  with  the  fact 
that  the  silent  intervals  are  too  long  for  the  ear  to  be  able 
to  "bridge"  then.  Other  experiments  (Huggins,  1974;  Wingfield 
and  Wheale,  1975)  have  shown  that  this  decline  is  affected  by 
speech  rate,  and  the  variable  defining  the  decline  is  the 
amount  of  speech  in  each  speech  interval  (i.e.  the  number  of 
syllables,  phonemes,  etc)  rather  than  its  duration.  On  the 
other  hand,  the  recovery  of  intelligibility  as  speech  and 
silent  intervals  are  further  shortened  is  due  to  the  ear's 
increasing  ability  to  bridge  the  silent  intervals  as  they  are 
shortened.  The  recovery  due  to  the  gap-bridging  takes  place 
despite  the  progressive  decline  of  intelligibility  of  the 
speech  intervals,  as  they  are  shortened.  The  recovery  is  not 
dependent  in  the  same  way  on  speech  rate  (Huggins,  1975b). 

How  are  the  foregoing  experiments  rei«ted  to  the  efxects 
of  lost  speech  packets?  At  present,  each  lost  packet 
introduces  a  silent  interval  lasting  135-270  msec.  These 
silences  are  too  long  for  the  ear  to  bridge.  As  long  as  their 
rate  of  occurrence  is  lo^  they  have  only  a  small  effect  on 
intelligibility,  since  the  intervals  of  speech  occurring 
between  successive  silences  tend  to  be  quite  long.  As  the 
rate  of  lost  packets  increases,  the  duration  of  intact  speech 
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intervals  declines,  with  serious  effects  on  intelligibility, 


The  tasks  in  the  foregoing  experiments  are  quite  similar 
to  conditions  a  vocoder  user  might  actually  encounter.  The 
shadowing  task  can  be  thought  of  as  increasing  the  processing 
load  on  the  listener.  Although  a  real-life  user  would  not 
normally  repeat  all  he  heard,  word-for-word,  and  might 
therefore  better  understand  the  more  difficult  passages,  he 
might  easily  have  other  secondary  tasks  to  perform,  or  be 
operating  under  adverse  conditions,  which  could  produce 
increases  in  processing  load  similar  to  those  induced  by  the 
shadowing  task. 


There  are,  however,  two  aspects  of  the  tasks  that  are  not 
very  realistic.  First,  the  silent  intervals  were  regularly 
spaced  in  time,  whereas  one  would  expect  late-arriving  packets 
to  occur  randomly  in  time.  However,  two  earlier  studies 
suggest  that  randomly  timed  deletions  would  produce 
intelligibility  decrements  similar  to  those  obtained  with 
regular  deletion.  Miller  and  Licklider  (1950)  reached  this 
conclusion  in  their  study  of  the  intelligibility  of  PB  word 
lists  subjected  to  regular  and  to  random  interruptions,  and 
Cherry  (1953)  mentions  the  same  conclusion  in  his  first  study 
of  speech  alternated  between  the  ears.  (See  Huggins  (1964) 
for  arguments  that  alternated  and  interrupted  speech  sho w 
reduced  intelligibility  for  the  same  reason). 
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Secondly,  the  proportion  of  speech  discarded  in  the 
interrupted  speech  experiment  described  above  was  50%,  and  it 
is  unlikely  that  packet  loss  rates  <n  the  ARPANET  would  ever 
be  this  high.  On  the  other  hand,  Jim  Forgie *s  demonstration 
at  the  Reston  meeting  showed  that  intelligibility  can 
affected  by  even  quite  low  loss  rates. 

is.  A  REMEDY 

The  most  obvious  remedy  for  the  problem  of  lost  packets 
is  to  increase  the  redundancy  of  transmission,  so  that  speech 
parcels  do  not  get  lost.  Two  obvious  ways  of  increasing 
redundancy  are,  1)  to  transmit  each  packet  twice,  and  2)  to 
arrange  that  each  parcel  of  speech  is  transmitted  in  two 
different  packets.  These  procedures  effectively  square  the 
probability  of  a  lost  packet,  but  at  a  cost  of  raising  the 
overhead  to  a  minimum  of  58.7%,  since  one  of  every  two  packets 
contains  no  new  information. 

There  are  other  possibilities.  All  the  studies  mentioned 
above  agreed  in  the  conclusion  that  the  disruption  of 
intelligibility  becomes  less  severe  as  the  duration  of  the 
silent  intervals  is  reduced.  The  ideal  way  of  reducing  the 
intelligibility  deficit,  resulting  from  lost  packets,  is  to 
s’. institute  the  loss  of  parcels  for  the  loss  of  packets .  The 
loss  of  a  single  parcel  results  in  a  silence  of  19.2  msec, 
which  produces  a  negligible  effect  on  intelligibility,  even  at 


BBN  Report  No.  3263 


Bolt  Beranek  and  Newman  Inc. 


high  loss  rates. 

There  are  two  ways  to  achieve  the  replacement  of  lost 
packets  by  lost  parcels.  One  is  simply  to  equate  parcels  and 
packets,  transmitting  a  single  parcel  in  each  packet.  This 
would  virtually  eliminate  the  intelligibility  loss,  even  at 
loss  rates  approaching  50$.  Note  also  that  this  solution 
would  almost  eliminate  that  part  of  the 
speech-input-to-speech-output  delay  generated  during  coding 
and  packing  the  speech  for  transmission.  The  cost,  again,  is 
in  greatly  reduced  efficiency  of  transmission.  About  75$  of 
transmitted  bits  would  be  overhead,  if  every  packet  contained 
only  a  single  parcel.  This  remedy  is  therefore  less  efficient 
than  transmitting  each  packet  twice. 

A  way  of  reducing  the  overhead  costs  of  both  thf 
foregoing  solutions  (repeating  packets,  and  one  parcel  per 
packet)  would  be  to  adopt  the  less  efficient  procedure  only 
when  packet  loss  rates  are  becoming  objectionably  high, 
perhaps  under  feedback  control  of  the  receiver.  A 
disadvantage  of  this  approach  is  that  the  most  probable  reason 
for  a  packet  being  delayed  is  that  the  net  is  being  heavily 
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iu  PROPOSED  SOLUTION. 

A  second  w ay  of  replacing  losu  packets  by  lost  parcel s  is 
to  distribute  the  parcels  between  several  packets  in  such  a 
way  that  loss  of  a  packet  does  not  result  in  loss  of  adiace.... 
parcels.  This  cor. Id  be  achieved  by  interleaving  -  that  is,  by 
transmitting  odd-numbered  parcels  in  one  packet,  and 
even-numbered  parcels  in  a  second.  The  loss  of  one  packet 
would  then  result  in  a  brief  burst  of  interrupted  speech,  at  a 
rate  of  25  interruptions  per  second,  which  would 
(extrapolating  from  Figure  1)  have  a  negligible  effect  on 
intelligibility,  even  at  quite  high  loss  rates. 


The  proposed  solution  does  not  increase  the  overhead, 
since  it  effectively  takes  advantage  of  the  redundancy 
inherent  in  the  speech  waveform,  rather  than  adding  redundancy 
deliberately.  It  effective]y  squares  the  probability  that  a 
lost  packet  will  result  in  a  silent  interval,  since  the  loss 
of  one  packet  results  in  a  burst  of  interrupted  speech,  and 
two  sequential  packets  must  be  lost  for  a  silent  interval  to 
occur . 


There  is  one  condition  under  which  none  of  the  foregoing 
redundancy  adding  schemes  would  work.  If  the  probability  of  a 
packet  being  delayed  was  not  independent  of  the  fate  of  other- 
packets,  the  chance  of  two  adjacent  packets  being  delayed 
might  be  close  to  the  chance  of  a  single  packet  being  delayed. 
This  could  easily  happen  if  the  reason  for  a  packet  being 
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increased  overhead. 

In  the  interleaving  scheme  outlined  above,  odd-nunbered 
parcels  are  transmitted  in  one  packet,  and  even-numbered 
parcels  in  a  second.  This  is  diagrammed  in  Figure  3a,  where 
each  digit  represents  a  parcel.  The  first  six  odd-numbered 
parcels  are  transmitted  in  the  first  packet,  and  the  first  six 
even-numbered  parcels  in  the  second.  There  is  a  temporal 
offset  of  one  parcel  between  packets  1  and  2,  but  an  offset  of 
11  parcels  between  packets  2  and  3-  There  are  some  advantages 
to  staggering  the  interleaved  packets,  so  that  the  first 
parcel  of  the  later  packet  slots  into  the  middle,  rather  than 
the  start,  of  the  preceding  packet.  The  staggered 
interleaving  scheme  is  diagrammed  in  Figure  3b.  In  the  former 
scheme,  packets  become  ready  for  transmission  in  pairs,  which 
maximizes  the  chance  of  both  packets  being  delayed  if  network 
overload  is  the  cause  of  delay.  Thus,  packet  2  is  ready  for 
transmission  one  parcel  after  packet  1,  but  packet  3  is  not 
ready  until  11  parcels  after  packet  2  (with  six  parcels  per 
packet).  In  the  staggered  scheme,  this  risk  is  reduced,  since 
each  packet  becomes  ready  for  transmission  either  five  or 
seven  parcels  after  the  preceding  packet. 

A  second  advantage  of  a  staggered  scheme  of  interleaving 
is  that  the  decision  to  proceed  without  a  packet  can  be 
reviewed  at  the  start  of  the  next  new  packet.  If  the  late 
packet  has  arrived  by  then,  the  later  parcels  in  the  late 
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packet  can  be  incorporated  in  the  reconstituted  speech.  This 
procedure  would  often  halve  the  duration  of  interrupted  speech 
introduced  by  a  late  packet. 

We  propose  to  run  intelligibility  tests,  using  the  IEEE 
recommended  sentences,  to  test  the  correctness  of  the 
foregoing  arguments.  The  simplest  method  of  performing  the 
tests  is  to  acquire  recordings  of  the  sentences  that  have 
already  been  passed  through  a  variety  of  vocoding  systems,  and 
then  simulate  the  effects  of  lost  packets,  and  lost 
interleaved  packets,  by  appropriate  analog  switching  of  the 
waveform.  Any  comments  or  suggestions  will  be  appreciated. 
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APPENDIX  E 

INSTRUCTIONS  TO  HIGH  SCHOOL  SUBJECTS 

1)  We  are  doing  research  on  ways  to  transform  speech  into  numbers 
so  that  people  can  speak  to  computers,  and  so  that  computers  can 
repeat  the  message  to  ethers,  while  sounding  just  like  the 
original  speaker. 

2)  The  approach  requires  transforming  speech  sounds  into  strings  of 
numbers . 

3)  That  is  not  difficult.  For  example,  take  an  electrical  signal 
from  a  microphone,  measure  the  voltage  and  feed  the  voltage 
readings  into  the  computer. 

4)  The  problem  is  that  in  order  to  end  up  with  computer  speech  that 
is  sharp  and  clear,  and  sounds  like  the  original  human  speaker, 
a  very  fine  record  of  the  voltage  changes  is  required.  It  takes 
thousands  of  numbers  to  represent  just  one  little  word. 

5)  What  we  are  trying  to  do  is  find  ways  of  taking  away  a  lot  of 
the  numbers  without  affecting  the  clarity  or  recognizability  of 
the  words. 

6)  Today  we  want  to  see  how  successful  some  of  these  approaches 
are . 

7)  We  will  have  you  listen  to  some  words  spoken*  by  a  computer. 
•Actually  the  computer  puts  out  voltage  readings  which  drives  a 
Hi  Fi  set.  Sometimes  the  words  will  be  sharp  and  clear,  and 
sometimes  they  will  be  very  difficult  to  hear. 

8)  Because  you  might  be  able  to  recognize  familiar  words  even  if 
they  are  unclear,  we  will  use  artificial  words. 

9)  They  will  be  very  short  words  like: 

TUP 
G  U  K 

Z  I  M 

S  I  Z 

10)  We  will  tell  you  the  vowel  in  the  middle.  You  will  select  the 
consonants  on  one  or  both  sides. 

11)  Lets  do  some  examples: 

A)  For  this  list  there  is  a  single  set  of  possible  consonants 
.  The  consonants  are  b  d  g  v  z  zh 
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The  sound  of  each  is  familiar  except  perhaps  for  zh  -  as  in 
azure . 

The  vowels  are  ah  as  in  (father) 

ih  as  in  (bit) 

The  first  item  will  have  ih’s  in  the  middle 


When  I  say 
consonant . 

the 

word ,  listen 

for  the 

first 

and 

last 

Tell  me  the  first  consonant 
string  on  the  answer  sheet. 

by 

circling 

it  in 

the 

left 

Tell  me  the 

final 

consonant 

by 

circling 

it  in 

the 

right 

string  on  the  answer  sheet. 

.  Every  word  will  be  preceded  by  ah 
Read" 

o 

B)  Sli^ntly  different  situation 

.  String  of  possible  first  consonants  different  from  final 
consonants 

.  Sounds  of  consonants  familiar  except  perhaps  y  as  in  (yet) 
and  ng  as  in  (sing) 

.  Vowels  ah  as  in  (father),  ih  as  in  (sing) 

.  This  time  we  will  do  6  items  in  a  row 

.  Write  down  clock-count  you  see  on  clock  after  you  have 
circled  final  consonant  for  each  item.  Put  clock-count  in 
space  to  right  of  each  item. 

C)  Still  different  situation 

.  There  is  just  a  first  consonant 

.  Vowels  i  as  in  (beat),  ah  as  in  (father) 

.  Lets  do  six  items,  5  seconds  apart 

.  Write  down  time  after  circling  the  consonant 


*Check  Answer  Sheet  (C) 
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12)  .  You  will  have  other  lists  as  well  as  these 

.Just  check  the  heading  for  consonant  sounds,  vowel  sounds. 

.All  items  will  be  5  seconds  apart 

13)  Be  as  accurate  as  possible,  but  be  as  fast  as  possible. 

14)  Take  as  much  time  as  you  need  to  be  as  sure  as  you  ever  will 
be,  but  take  absolutely  no  more  time  than  you  have  to. 

15)  We  are  very  interested  in  whether  it  takes  longer  to  hear  some 
of  these  words  than  others. 

16)  To  show  differences  in  hearing  time,  you  have  to  respond  as 
quickly  as  possible. 

16a)  What  number  to  mark.  Number  you  are  sure  must  have  been  on 
clock  when  you  looked  up. 

16b)  Write  time  first,  then  fix  mistakes. 

17)  Now  having  said  that:  I  don't  want  you  to  blow  a  gasket  trying 

to  be  super  good  -  at  the  start  -  and  then  be  so  wrung  out  that 
you  do  a  bad  job  at  the  end.  This  will  be  a  long  session,  it 

may  get  to  be  pure  drudgery.  Please  try  to  adopt  a  level  of 

tension/effort  that  will  carry  you  through  to  the  bitter  end 
operating  at  an  effective  level. 

18)  Just  because  some  items  sound  like  you  heard  them  before,  <ion't 

assume  they  are  same  or  if  same,  that  your  prior  response  was 

right,  i.e.  make  independent  judgements  on  each  item. 

19)  We  will  take  a  break  about  half  way  through,  cokes  on  the 
house . 


